zoriya commented 4 weeks ago

I talked a bit about it on the discord or twitter, but it's time to put everything on text:

I'm planning a complete rewrite of the backend & database restructuration. This is needed to support #87 and #282. This will also make #463 (big thanks to @Arthi-chaud for the brainstorming there) or #549 possible. This is also a great occasion to tackle tech dept and bad decisions I made during the 5 years I've been working on this codebase.

I plan on writing diagrams and asking for feedback on the discord at most turning points of the code, so please give your feedback if you're interested!

To give a concrete vision, I plan to:

[ ] Rewrite the auth and extract it to its own service
- [x] Write the spec #573
- [ ] Write the service #610
- [ ] Handle #346
[ ] Rewrite the API
- [x] Design core database logic (see below)
- [ ] Design a new API to handle mixes of episode/movies/special/recaps/extra
- [ ] Rewrite the existing API (elysia w/ bun - javascript)
- [ ] Add websockets as a core part of the API (elysia has great websockets support)
- [ ] Handle scaling (websockets needs custom scaling logic)
[ ] Update the scaning API
[ ] Probably merge autosync into the API

I'm going to explain each points and why I want to design Kyoo this way, as always feel free to give your opinion or ideas.

Why a separate auth service

Kyoo has multiples HTTP services, for now we have the API & the transcoder. To ensure users have the correct level of permissions, all requests hit the API which does permission validation and then proxy the transcoder. This is bad for performances, scaling and DX. This idea with a centralized auth service is to have the reverse-proxy/ingress/gateway call the service and trade an opaque auth token for a jwt (see #573's phantom token part). The short-lived jwt will be used by downstream services (API, transcoder, scanner...) to check for permissions.

Having this service stand alone also makes it possible/simple to have #346.

The auth service could also be used by others applications (as long as they are compliant w/ the license).

Why rewrite the API from scratch

The current backend is written in C# which lacks sum-types. Kyoo's logic often works on types like Movie | Serie, Episode | Movie, Episode | Special and so on. The lack of sum-types in C# makes it hard to work with, we have multiples interfaces and custom logic scattered everywhere to handle this well. This is why JavaScript was chosen as the replacement (we could have used a more functional language like Elixir, OCaml or even Gleam, but the core value of kyoo is not it's API, so I think trading some perfs for velocity here will be really important. I also think Gleam is too early in development to write everything in it).

For #87, we need to rewrite basically every single type of kyoo (Series, Episodes, Movies... all need their translatable fields moved to another type & so on). Fixing types one by one & their SQL interaction would probably take more time than just rewriting everything (and is wayyy more boring).

What's up with episodes/movies/special/recaps/extra

Right now, Kyoo took the simplest approach of having either a Movie or a Serie containing seasons that contains episodes. In reality, this is a bit more complicated. Serie can have movies that should be watched between seasons.

Most online databases TVDB/TheMovieDB uses the "Season 0" as a special season, and we've used that until now, but this feels more like a workaround than a proper feature. Some specials are:

critical to the watching experience and needs to be watched between seasons/episodes.
simple recaps that rehash one/multiple episodes and can be skipped (but still need to be shown at their proper place in the timeline of the app)
extra content like short episodes (2/3min long)

Note that specials can also be movies.

To give an example:

Made in abyss is an anime with 2 seasons & 3 movies (at the time of writing). The first 2 movies recap the first season and the 3rd movie must be watched before the 2nd season. This means watch order is 1st season -> 3rd movie -> 2nd season. The 1st/2nd movies should be shown close to the 1st season but be greyed out since it's a recap.

Websockets

I wanted to add websockets to kyoo for a long time (for features like #341, #297 or #342). This would also make invalidating cache for "Continue watching", "Next up" and "Watch status" easier in various apps.

I never really got around to writing it, since I was not happy with the options I had. C#'s built-in websocket solution uses a weird format that can only be used w/ their own lib so it felt wrong & writing a service specifically for that was counterproductive since it would need lots of logic shared by the API (I still did a poc in the feat/ws-rabit branch).

Elysia as a good websocket handling & the format is easily readable by any client so I'm happy about this. We would just need a message queue to handle replications.

On the scanning API

Right now, the matcher (part of the scanner that fetch metadata & pushes them to kyoo) is using a REST API to register new videos. When there are a lot of new videos to register, this kinda DDOS the API. This is also inadequate for data that could exist or not. For example. when we register an episode, the associated season/series can be already registered in kyoo or not.

Migrating to a queue based system w/ the matcher producing items to register & the API consuming these items seems like the way to go. When the API encounters an episode missing season/series data, it could push a request in another queue.

Why merge autosync

For those unaware, autosync is the service responsible for marking episodes watched on external services (SIMKL and in the future Trackt, MyAnimeList, AniList & so on).

Making this a separate service was an error, some services need to hook at different times of the playback (for example Trackt want to be notified when playback starts, is paused/resumed and finishes). The current way also makes it impossible to report errors to the client. Integrating it to the backend directly would make this way easier.

Open questions

I'm still undecided about some things:

Should we keep Meilisearch as a search backend, or can postgres do that for us?

this was discuted in #420 and I think meilisearch is a great way to solve search but I'm open to reconsider this if we can have similar results w/ postgres only. Side note but one of most highly rated under consideration feature of their roadmap is a recomandation system.

Should we use both RabbitMQ & Redis?

I plan on adding Redis (probably via valkey) soon for #579, distributing the transcoder's lock and the scanner's cache. I know redis can be used as a message queue, should we simply use redis for everything?

zoriya commented 4 weeks ago

Here is a draft of the new database schema:

I'll open a PR with it once I get some more work in it.

Arthi-chaud commented 4 weeks ago

Would be happy to help you with all this!

acelinkio commented 4 weeks ago

Going into the next major version, would be worth considering moving kyoo to an Github organization and moving each of the microservices into projects of their own. Outside of just organizing differently, the next priority would be ensuring a sane development experience.

zoriya commented 4 weeks ago

I think for a small team a multi-repo setup is worse for DX.

Having a single repo means a single issue tracker which is a definite +

It's also possible to do PRs impacting multiple services instead of having two/three and jumping between repo/pr to get the whole context.

zoriya commented 2 weeks ago

To give a small update: ive started working on the auth service. I decided to do it in golang instead of gleam, gleam feels too early for that yet. (branch is feat/auth)

I'll continue working on it and make a PR with the api's spec in the next week.

thinkbig1979 commented 6 days ago

With regard to the DB, I suggest you take a look at EdgeDB. Their demo dataset is actually a movie dataset 😄 I'm a big fan of graph representation of data, and am currently developing an application using EdgeDB. I'm not a developer myself, but my dev team has enjoyed it so far.

zoriya / Kyoo

v5 plans #597

Why a separate auth service

Why rewrite the API from scratch

What's up with episodes/movies/special/recaps/extra

Websockets

On the scanning API

Why merge autosync

Open questions