midarrlabs / midarr-server

šŸ”„Midarr, the minimal lightweight media server.
MIT License
1.2k stars 36 forks source link

RFC: Transcoding #321

Closed onedr0p closed 11 months ago

onedr0p commented 1 year ago

This is mainly also on the topic of transcoding, I know you have some opinions on Midarr and transcoding but this is to bring up a discussion transcoding and how it could be implemented if you were to ever think about adding such abilities.

The way Midarr works currently is that transcoding is not really supported, anything outside x264 most likely will not play. I wonder if it is worth the effort to use jellyfin's ffmpeg as a library to integrate transcoding in general.

On the topic of distributed transcoding

Given you agree that transcoding is on the slate for possibly being implemented...

I really love that I can scale midarr to n number of pods in kubernetes, however there is no way to "load balance" transcoding requests so that an overworked container will not pick up new requests for transcoding.

These are just questions to chew over on future ideas of Midarr.

trueChazza commented 1 year ago

Thanks for opening this discussion. I think this is definitely a topic worth talking and exploring more about, and as youā€™ve mentioned Midarr now has a half baked solution for this.

Youā€™ve brought up a few interesting points to discuss:

trueChazza commented 1 year ago

ffmpeg jellyfin

What is the benefit of this over say just ffmpeg itself? A quick look through the repo it seems jellyfin has correlated only the tools they need for their transcoding pipeline.

Unless Iā€™m missing something?

trueChazza commented 1 year ago

Iā€™m all for not reinventing the wheel and have looked into other external / decoupled solutions like Go Transcode before doing a basic Midarr implementation.

Go Transcode not only provides the ffmpeg / transcoder implementation, but also the ā€œglueā€ or interface Midarr can consume for transcoding.

trueChazza commented 1 year ago

If Go Transcode is a viable option to get us up and running quickly, but also achieve load balancing across Midarr instances. All that weā€™d need really is the Midarr implementation for it.

onedr0p commented 1 year ago

I just picked jellyfins ffmpeg because I figured it would be a fork of ffmpeg geared towards transcoding various different types of media so many nothing too special there.

go-transcode seems interesting as well!

trueChazza commented 1 year ago

Would you be willing to test out Go Transcode and see if itā€™s a good viable solution?

Itā€™s come a long way since I last looked at it. It didnā€™t have VOD support back then, which I see it does now.

trueChazza commented 1 year ago

What exactly is Midarr lacking in its albeit basic transcoder at the moment?

How does it play your media that previously you said didnā€™t have any audio?

If you had a wishlist for Midarr transcoding, what would that be?

onedr0p commented 1 year ago

If you had a wishlist for Midarr transcoding, what would that be?

That's a pretty hard question to answer because I am very much in love with Kubernetes for managing containers. It would be awesome if Midarr could have a "transcoding" service which would be a dedicated number of "warm" containers just for transcoding.

To get deeper into the weeds it would amazing if Midarr could speak to the kubernetes API to create/destroy pods on demand for transcoding while also having "warm" transcoding pods already available. As far as I know in standard docker-compose land, or hell even using a thing like Portainer this would not be possible. Kubernetes is amazing at automation like which is why I am so in love with managing containers with it.

A Midarr Kubernetes operator would be amazing if you wanted to learn more about Kubernetes šŸ˜„ but I understand this is a huge change and would require major refactoring. You did ask for a wishlist though šŸ¤£

trueChazza commented 1 year ago

Yeah Kubernetes is amazing I agree!

Thatā€™s quite a Wishlist šŸ¤£ a Midarr K8s operator sounds interesting, but is out of scope for the Midarr app itself.

I created exstream as the upstream transcoding service Midarr currently relies on. Packaging this service up in its own Docker container would be a start - and weā€™d be able to scale this n number of containers in K8s and Docker etc. How does that sound?

Plus by all means - if youā€™re keen to create a Midarr K8s operator, you have my full support!

onedr0p commented 1 year ago

Maybe an operator could be done in the long term. Breaking out the transcoding from the main app would be very neat in any way it could be done. That would really set Midarr apart from the other players in the game like Jellyfin and Plex which are behemoth monoliths.

trueChazza commented 1 year ago

Yeah wouldnā€™t take much to package up exstream, Iā€™ll just need to figure out the public API for it šŸ˜

Iā€™ll look into it after v3 release. Letā€™s keep this conversation going though - a lot of great ideas already!

bo0tzz commented 1 year ago

I just had this idea and came here to find the discussion about transcoding already going, but I figured I'll share it anyways: Midarr's approach of leveraging existing services that people are already running is very nice. If that could somehow be applied to transcoding too, for example by hooking into something like Tdarr, that would be pretty cool. I don't know whether Tdarr (or other software like it) would support something like that at all though.

If that's not possible, I agree with onedr0p that breaking out the transcoding to exstream or another app running externally would be very nice to have. Over at Immich we don't need to do any live media handling, but we still have all the processing done in a separate container which keeps the main UI/API nice and responsive and makes scaling easier to reason about too :D

onedr0p commented 1 year ago

Tdarr not being open source is a full stop for me.

bo0tzz commented 1 year ago

I wasn't aware of that, but I agree! I just used Tdarr to illustrate my idea, I believe there are other similar applications out there that are open source.

trueChazza commented 1 year ago
services:

  midarr:
    container_name: midarr
    image: ghcr.io/midarrlabs/midarr-server:latest
    environment:
      - EXSTREAM_URL=http://exstream

  exstream:
    container_name: exstream
    image: exstream:latest
    volumes:
      - /path/to/media:/media

Just noting this down for more discussion. If we packaged up exstream into its own image - and have Midarr reference it for all streams, this would be a breaking change.

I'm not too keen on having Midarr default to itself, then reference exstream optionally (opt in to exstream).

What do you think?

trueChazza commented 1 year ago
services:

  exstream:
    container_name: exstream
    image: exstream:latest
    volumes:
      - /path/to/media:/media

There would also be no reason to mount libraries into Midarr, only Exstream. Midarr would just proxy on the media locations received from Radarr / Sonarr to Exstream.

So Midarr would send something like this to Exstream to resolve: http://exstream/movies/Some Movie/some-movie.mp4

trueChazza commented 1 year ago

I'm not too keen on having Midarr default to itself, then reference exstream optionally (opt in to exstream).

I would prefer Exstream be the default / required option.

bo0tzz commented 1 year ago

I find the idea of the main midarr container just "coordinating" things with all the hard work left to exstream pretty appealing.

onedr0p commented 1 year ago

I like this idea too!

bo0tzz commented 1 year ago

I'm thinking about how handle scaling to multiple exstream instances, and I'm not managing to come up with an obvious answer. With each exstream instance being stateful, I think the midarr server would have to be able to reference them individually in some way. What are your thoughts on that?

onedr0p commented 1 year ago

I'm not sure there is an obvious answer, it really depends on if Kubernetes wanted to be a first class citizen. If not, there would need to be a set amount of exostream containers that would need to handle the load and run all the time. IIRC This is pretty much how tdarr does it with their agents, since there's no kubernetes operator for it.

bo0tzz commented 1 year ago

What about an inverted setup where instead of midarr having the address of the exstream container, exstream registers itself with midarr? Then dns etc aren't really a concern.

trueChazza commented 1 year ago

What about an inverted setup where instead of midarr having the address of the exstream container, exstream registers itself with midarr? Then dns etc aren't really a concern.

Interesting! Could you provide an example or expand on how this could work? Iā€™m not sure how Exstream would register itself.

trueChazza commented 1 year ago

If Exstream is an agnostic http service, how would it know to register itself with another service?

bo0tzz commented 1 year ago

If it registers itself, it wouldn't be agnostic. Exstream would be configured with Midarr's address, and hit an API endpoint to register itself.

I guess a preceding question to this is: How do we want to handle the stream bytes, after Exstream has been instructed to start transcoding a file? I think the cleanest option (that I can come up with) is for Exstream to just expose the stream on HTTP, and then either Midarr (or a fronting nginx) proxies that endpoint or the client hits Exstream directly. Another option is that Exstream writes the transcoded stream to a shared volume, and the Midarr server takes care of serving that on HTTP, but that might be coupling things too tight.

If we go the proxy route, when there are multiple Exstream backends, Midarr needs to be able to map each stream to the correct backend. I believe the docker (and kubernetes) native way is doing this through separate DNS names: exstream-1, exstream-2 etc, but requiring a particular DNS layout feels brittle to me which is why I'm looking for another approach.

bo0tzz commented 1 year ago

This is maybe more for implementation details of Exstream, but https://membrane.stream/ could be worth taking a solid look at.

trueChazza commented 1 year ago

Awesome thanks for explaining that. I was initially thinking of the proxy option you mentioned. I would prefer Midarr be as loosely coupled as possible to Exstream.

Iā€™m leaning more towards Exstream just being a standalone HTTP service that anyone could use. Midarr would just implement the API.

trueChazza commented 1 year ago

If we go the proxy route, when there are multiple Exstream backends, Midarr needs to be able to map each stream to the correct backend. I believe the docker (and kubernetes) native way is doing this through separate DNS names: exstream-1, exstream-2 etc, but requiring a particular DNS layout feels brittle to me which is why I'm looking for another approach.

For this part - does Midarr really need to know those specific details? Iā€™m just asking because Iā€™m unsure.

If you scaled out Exstream to say 10 instances / containers and had Traefik load balance across them - could Midarr not just reference that single Traefik URL? Midarr wouldnā€™t need to know there are potentially more than 1 instance. Am I on the right path or way off? šŸ˜‚

onedr0p commented 1 year ago

could Midarr not just reference that single Traefik URL?

It could but what happens when there is 10 transcoding requests and they happen to land on the same exostream container? This is why I'm thinking midarr has to be aware of what work is happening on which exostream container and coordinate accordingly.

trueChazza commented 1 year ago

Ah true!! If that could potentially happen, then yes Midarr would need to be aware.

Might be worth me doing some discovery work around that too maybe. Just to validate our assumptions šŸ˜

trueChazza commented 1 year ago

This is maybe more for implementation details of Exstream, but https://membrane.stream/ could be worth taking a solid look at.

https://membrane.stream/guide/v0.9/packages.html

They don't seem to support h265 (yet?) šŸ˜¢

bo0tzz commented 1 year ago

They don't seem to support h265

Damn, I missed that :/ After looking at Exstream a bit more, I think Membrane might be overkill here anyways.

I laid out the architecture q to a few friends and together we came up with these options:

  1. Instead of Exstream being aware of Midarr and registering itself, we could include a small component (could maybe even be just a bash script) that runs in the same pod as Exstream and does the registration, with Exstream itself still staying agnostic.
  2. To make the DNS based approach a bit friendlier to other platforms, we could make it so that:
    • At start, Midarr is configured with a list of Exstream addresses. This can be DNS names or IP addresses. eg EXSTREAM_ADDRESS=exstream;192.168.1.8.
    • Any DNS names in this config are resolved to their IP addresses (with possibly multiple A records on one DNS name).
    • After this, all the IPs are added to the pool.

With approach 2, Midarr needs to regularly re-fetch the DNS records to make sure it's aware of instances being created/removed/etc. The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

trueChazza commented 1 year ago

This is awesome thanks for this!

trueChazza commented 1 year ago
  1. Instead of Exstream being aware of Midarr and registering itself, we could include a small component (could maybe even be just a bash script) that runs in the same pod as Exstream and does the registration, with Exstream itself still staying agnostic.

For this option does this register a single instance?

trueChazza commented 1 year ago
  1. To make the DNS based approach a bit friendlier to other platforms, we could make it so that:

    • At start, Midarr is configured with a list of Exstream addresses. This can be DNS names or IP addresses. eg EXSTREAM_ADDRESS=exstream;192.168.1.8.
    • Any DNS names in this config are resolved to their IP addresses (with possibly multiple A records on one DNS name).
    • After this, all the IPs are added to the pool.

With approach 2, Midarr needs to regularly re-fetch the DNS records to make sure it's aware of instances being created/removed/etc. The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

This we could definitely work towards. A few moving parts to break down into smaller deliverables, but I think the payoff would be huge! Being able to scale Exstream out with Midarr aware!

trueChazza commented 1 year ago

The Exstream API should probably also have some endpoints for its status (eg system load, health, whether it's shutting down) that Midarr keeps track of.

Something like a Prometheus exporter / stats endpoint? šŸ”„

trueChazza commented 1 year ago

https://github.com/midarrlabs/exstream/pull/10

Iā€™ve started working on a MVP release for Exstream already. Just for the MVP Iā€™ll get a minimal functional version working - then we can build these features on top and over the next iterations.

trueChazza commented 1 year ago

Iā€™ll add a roadmap for Exstream too so we can prioritise these features for releases.

bo0tzz commented 1 year ago

For this option does this register a single instance?

This would be one instance of Exstream with a sidecar script that registers that instance, but that approach can be repeated to register multiple instances (each with their own sidecar).

Something like a Prometheus exporter / stats endpoint?

That'd certainly be good to have, and if it has the right metrics then Midarr can use it to monitor the instance too. The goal there is that Midarr knows how the Exstream instances are doing load-wise etc, so it can make smarter decisions about where to assign new streams.

trueChazza commented 11 months ago

I'm going to move this into a discussion, as it's quite a broad topic. We can create issues tied to the discussion as we progress.