Library database EFCore migration and API v2 discussion

joshuaboniface commented 2 years ago

We've had this chat in Matrix a few times in the past and it came up again yesterday (I poked the bear I know), so I figure it would be prudent to get an actual meta issue going about how to approach this.

To recap the current situation:

All except one backend databases have been migrated to EFCore with minimal schema modification.
The "library.db" backend is currently the last one waiting for this migration.
The current library backend is split between SQLite (in library.db the file) and XML data within the Jellyfin data directory.
The frontend API ties very closely with the database functions (note: I have not verified this myself, this is my recollection of how it worked from those more knowledgeable than me).

This means that, to complete EF Core migrations and enable external database support (a very much-requested feature) and everything that provides (load balanced instances, failover, easy backup/restore, etc.), the library database must be ported into EFCore somehow.

For the finished migrations, @barronpm has been driving that project, but I know he has not had much free time to contribute recently, so for the most part work on this has stalled somewhat.

I'd like a 10.9.0 goal to be "finish the EF Core migration", so we could enable that feature for users sooner rather than later.

However this ties into our long-running "API v2" discussion, which in the past has had some strong opinions. So I figure I should outline what I think the options are and solicit feedback from the major stakeholders in @jellyfin/server.

The main reason for debate is because of general dissatisfaction within the project about the current Emby-inherited API (a.k.a. the "v1 API"). It has several limitations, doesn't make much sense in some places, has little organization, etc. So, from very early on in the project, there has been much talk about designing a better, more robust and extensible v2 API.

And because of how closely the API and the database backend are tied together, the plan here ties directly into the database migration.

As I see it from my minimal in-depth code knowledge and perspective primarily as a user and project manager, I see 3 real ways this can go.

We migrate the current database schema - warts and all - as much as possible into EFCore - depositing the XML right into DB rows instead of onto the filesystem, keeping the current SQLite schema functionally unchanged, etc. We make no code changes to the API functions whatsoever except what's needed to move "look up data from file on filesystem" to "look up data from table row in database".

Pros: We completely decouple any "v2 API" discussions from the EFCore migration today; we get the "multiple backend databases option" finished and out to users much more quickly; minimal functional code changes (no new functions, etc.). Cons: We're stuck with the current schema - warts and all - with EFCore; we have weird XML in the database (it's better than binary data at least) or we somehow translate this sensibly into new tables only.

We make functional improvements and modifications to the schema and port it into EFCore. This would change the functional schema of the database and would require some additional functions or changes to functions, up to and including translation wrappers, in order to preserve the current API exactly as it is.

Pros: This gets us what we want in terms of the database (better schema) while also preserving the API so that it actually gets done; we have a flexible foundation and could write a v2 API in a leisurely pace by leveraging the wrappers or the backend functions directly. Cons: More code changes right now; have to agree on the schema.

We tie EFCore and a v2 API completely together, with the former depending on the latter.

Pros: Everything is nice and clean at the end and during transition. Cons: Indefinite transition state during which EFCore migration is not complete; requires us to actually make an API v2 first; lots of code changes at once to implement both the EFCore library backend and the new API frontend; introducing a v2 API can't be done peacemeal and must be feature complete before released to the world along with transition time for clients and 3rd party integrations.

Now, I know the consensus seemed to always be on 3, but I don't think that's a sane way to move forward. There was talk of freezing current master for some indetermine time while this was worked on, and I've always fundamentally disagreed with that since it's effectively killing "Jellyfin" development as it exists today in favour of a functional rewrite of half the server code just to get a new database backend, which seems very excessive and user-unfiendly to me (no new features, bugfixes, etc. for possibly years, if it even happens at all). And as I've mentioned briefly above, no one externally is really asking for a "v2 API", so I'm not sure killing active development for that is really worthwhile. IF someone wants to do it, that's fine, but at that point it would really have to be a hard fork while the work is done (and this is what was basically proposed when this was last discussed).

That leaves the other options: 1 seems like the "simple" option, and while it means a bit of ugly code and ugly database tables, it gets the EFCore migration done and out of the way. 2 seems like the better, more robust solution, as it sets us up for a v2 "some day" while preserving the v1 API as it currently stands. What I don't really know is what the comparative work effort of either is. If 2 is not much worse than 1, we should really take that route IMO, otherwise we just do 1 and accept the bad DB, then worry about a better DB whenever we actually do the API rewrite.

So, open to thoughts and constructive criticism. Anything I missed, etc. It is ultimately up to the stakeholders how to proceed, but I stand my by recommendations here as the most healthy for the project.

Shadowghost commented 2 years ago

I agree that 2 would be the preferable way, especially since 1 would imply us writing a lot of big, possibly manual database migrations again once we want to get rid of the XML and one of the points of migrating to EFCore is that this should not be a manual process anymore.

In regards to 3: I think the earlier consensus was that the migration and API work will be done on a seperate branch with master still receiving bugfixes and bugfix releases. Depending on how extensive new feature are, merging them would need to be decided on a per-case basis. When the rewrite is done, the changes would be backported to the new database codebase.

joshuaboniface commented 2 years ago

In regards to 3: I think the earlier consensus was that the migration and api work will be done on a seperate branch with master still receiving bugfixes and bugfix releases. Depending on how extensive new feature are, merging them would need to be decided on a per-case basis. When the rewrite is don, the changes would be backported to the new database codebase.

It was kinda mostly that. I think there were some unclear opinions with some wanting a complete stop to all development on old master so that nothing would need to be backported, but that's possibly a strawman that I accidentally constructed due to misunderstanding. But I still think even that "weaker" version of it is bad for the software long-term since there's no guarantee that API V2 would ever be done in any amount of time, it really is a chicken-and-egg problem.

barronpm commented 2 years ago

It's been a while since I've dived into the code, but here's my two cents: I'm partial to option two, but it may be easiest to basically remove BaseItem and its logic all at once and convert directly from new schema -> BaseItemDto. It will probably be a bit slower than it would be with the new API to do all of the conversion logic, but it would make writing the new API trivial. One note about this is that it means that certain logic like library scanning will have to be rewritten or adapted along with the schema, but I think that it's worth it if it is feasible. Perhaps other members of the server team have a better idea on how feasible that would be.

oddstr13 commented 2 years ago

I have always had the opinion that 2 would be the best way to do it, but I haven't dedicated much time towards solving the problem myself, so I'd defer to those who actually ends up doing the work (with this suggestion). Probably a performance impact with the API translation layer, yes, but I think it would be worth it.

Option 1 would only be viable if the xml gets extracted into a db structure right now, so that migrations would be easier down the road, but this would probably just be more work than option 2 in the end.

Option 3 would potentially be somewhat like if we started a new server from scratch rather than forking the working, but "not the best" codebase that was Emby – a bunch of time spent on something, and nothing to show to users for a long time, with a probability of a slow death of the project (I'm probably exaggerating this a little).

EraYaN commented 2 years ago

Options is probably the most feasible while also having direct benefit for the design (which option 1 IMO does not) And if 3 would have been feasible we would have been finished by now. Performance of doing some extra object creation and translation will be negligible compared to EF Core itself. It will be good if we have that decoupled (DB and API) which we would sort of get for free. And then down the line we can start moving the EF Core entities towards the API to couple them again.

ferferga commented 1 year ago

Found an interesting tool that could help us with the API design: https://github.com/hoppscotch/hoppscotch

DornJ commented 10 months ago

Hi, I hope I'm not out of place by asking; has there been any progress regarding this?

barronpm commented 10 months ago

There's been some exploratory work done, but for now I'm focusing on some smaller improvements that should hopefully make the eventual migration easier.

brettpetch commented 10 months ago

Something that might help with redesigning the library db, it is an older version of the plex schema, but might be worth a look to see how other players in the same realm are doing it.

https://github.com/mutanthost/plex_schema

Current schema: https://gist.github.com/brettpetch/6f830a52ce843b6c12100324949d3ee6

EraYaN commented 10 months ago

@brettpetch That is mostly likely not FOSS licensed, so before looking at it, make sure you are not going to contribute. (And the same goes for anyone else)

brettpetch commented 10 months ago

@brettpetch That is mostly likely not FOSS licensed, so before looking at it, make sure you are not going to contribute. (And the same goes for anyone else)

It is most certainly not licensed for FOSS, but could be a good starting point to look at potential architectures :)

cvium commented 10 months ago

@brettpetch That is mostly likely not FOSS licensed, so before looking at it, make sure you are not going to contribute. (And the same goes for anyone else)

It is most certainly not licensed for FOSS, but could be a good starting point to look at potential architectures :)

You might be missing the point of licensing... If it is not licensed in a way that is compatible with ours, you're opening us up to all kinds of trouble. Respect the licenses.

jellyfin / jellyfin-meta

Library database EFCore migration and API v2 discussion #26