Distribute chunks over slaves

JJK801 commented 7 years ago

Hi,

This PR aims to distribute transcoding over multiple slaves at time.

Some input arguments are tweaked (-ss, -segment_start_number) and other added (-t) in order to delegate a set of segments (defined by SEGMENTS_PER_NODE - default 5) to each host and iterate over the timeline as the slaves finish their sets (allowing to have various CPU capicities on the same network).

Tested on my configuration and works like a charm:

1x Banana Pi M1 => PMS + PRT Slave + various services
1x Raspberry Pi 3 => PRT Slave + various services

Transcoding is very fast regarding the CPUs used (Flawlessly plays 1080p movies over the network).

Because of the NFS shares, it needs a powerful local network between server and slaves for high resolution movies.

Be also careful to put the biggest CPUs first to reduce initial load time.

TODO:

Speed up slave availability checks (2,5 to 3 seconds on each iteration with my configuration)

liviynz commented 7 years ago

So first up I'd like to say kudos for original thinking around your approach, it's good to see people showing initiative, too many are content to reuse what they find via google! Genuine praise...

Now comes the other side I'm afraid, personally I think we shouldn't be bringing this sort of function into PRT without at the very least a total from head to foot scrutiny under a microscope...and that's if at all as it a very complex type of function at the core of it and it's a minefield of problems lurking under the surface.

I'll admit that i haven't dived into the code yet as I just managed to start catching up with all my git notifications and this stood out as something that I had to flag when I read the brief about it. That being said, I'm coming at this from the perspective of someone who has been a technical lead on designs/builds/support of enterprise cluster deployments. They are some of the trickiest miracles to perform! Even using very expensive clustering software from the industry leaders, they all have issues...and that's using enterprise class hardware and attempting to minimize your issues with your solution by things like : having direct access to data via SAN arrays rather than network shares, maximizing uptime on your nodes with resilient systems, etc, etc. So introducing a function that takes an input from and external source (PMS) that we have no control over and is continuously changing with each version and then trying to distribute/load balance individual jobs across multiple nodes which could be a random mixture OS's/hardware while dealing with a having to read/write from NFS to work reliably in producing a successful output every time (or even almost every time) really is a large piece of muddy land filled with death and a pot of gold on the horizon... In a controlled Proof of Concept environment the issues can be minimal but when it goes into the wild it will be a totally different situation where we will likely be overloaded with questions/issues built on very complex configurations which will consume the project. Yes, I'm just one person who some will say is being ridiculously pessimistic about this function but I felt I had to make my point anyway due to how I see the it picture and what a difficult journey it is likely to be for it to be a success. Lastly I'd like to just make something else clear, I'm by no mean saying the idea/code/work done behind this is bad in itself! I like the concept from a theoretical perspective but I think bolting it into PRT (which remember can be deployed across such a mixture of platforms by anyone that wants just wants to) a bad move for the PRT project due to the amount of moving parts required to work perfectly when splitting single jobs across multiple nodes then re-assemble. It highlights the vulnerability the process has on all of the nodes involved (plus associated supporting infrastructure for those nodes) being working correctly for a successful run each time a job is transcoded... That's my view anyway and I've made my point in an extremely lengthy (some will say too lengthy) manor so I'll stop here hoping it's taken in the constructive but highly concerned way I meant it. Liviy

JJK801 commented 7 years ago

@liviynz Thanks for your feedback

I completely understand your point of view and i agree. But, we aren't discussing about an entreprise software project, this is most a tweak for those who want optimize their media server at a small cost (e.g like me using some cheap hardware to transcode high quality movies). The installation method, the lack of automated testing and the manual configuration shows that it's clearly a POC that the creator shares with us (thanks to him).

IMO Plex have to work on this subject (as we prove here that it's not a so big challenge), but as they don't we can continue to improve this project and maybe provide some ideas for the futur version by a working POC.

I've many ideas to improve this project and i submitted the only "essential" part here (the distributed transcoding) because it has no real BC breaks as it just improve the background algorythm and finally the user experience.

My work on this PR is not finished as i've to make it more "fault tolerent" (by retrying the transcoding if segments failed), but all i saw using it on my configuration were quite positive (same quality as the Plex default transcoder but really faster).

FYI the previously mentioned improvement ideas:

Package the lib to auto install it (deb, rpm, etc...)
Implements a service discovery (mDNS maybe ?) so we don't need to configure each slave on both sides

It's up to the maintainers to choose whether or not they want to merge it, it doesn't change anything for me as i'm using my fork.

Thanks

wnielson commented 7 years ago

First off, thank you @JJK801 for your work on this! I haven't had time to look through the code yet, but I agree with @liviynz that if we incorporate this feature, we will have to do so carefully (perhaps as an optional feature).

As to your other suggested improvements, I'm intrigued by the idea of automatic slave discovery. We have some other features on the todo list which we could use some help with as well :)

yacn commented 7 years ago

This is exactly what I've wanted from PRT. Definitely want to keep an eye on this PR.

liviynz commented 7 years ago

The two main challenges as I see it that we have with PRT as it stands are : 1) We're at the mercy of the PMS devs, they're being very cool and rapidly changing their code to bring out new features, etc however due to these constant changes in the core component it means that we're always on the back foot and having to change our code or find workarounds to keep PRT working with it. A prime example is PMS 1.4+ is currently incompatible with PRT, this has happened at least once maybe twice since 1.3. 2) We are two people doing this via open source and PRT is popular! There's 3 different contact areas that I'm aware of and try to keep across where we get requests to help users with PRT issues. Rarely it's an issue with PRT and more often it's either a misunderstanding on their part on setup/use or they have a different/unique setup that is outside the recommended one, etc. This all keeps us busy when we have time to allocate to PRT so when deciding what to add into it we have to be careful not to be too ambitious and leave ourselves in the position that we are totally swamped by the users asking for help.

Hopefully that gives a bit of a better background for things. I'm totally up for reviewing the code and talking it through with the boss man but due to the reasons I mentioned before I'm very cautious. I know we're not dealing with enterprise but my point was that I see quite a few issues arising when using the best hardware and running clustering software which costs 100's of thousands of dollars so scaling that to open sourced code that could be running on virtually any Linux distro on any hardware there's a lot of grey area there for problems purely from the sheer variations in there.

I need to catch up on my coding anyway so definitely keen to catch up internally to talk about this and some other items...my biggest issue is I'm not native with Python and if it was built around one I am native with I'd be hammering out new quality functions quite quickly. E.g. Auto detection of slaves, I have code sitting here right now that will do just that! I wrote it for another project and it works perfectly, it even allows for quick communication via ssh, etc but it's not in Python unfortunately.

Anyway, better get back to my paid job I suppose haha

zachbunyard commented 7 years ago

@JJK801 Did the video, audio, or both get transcoded (Did PMS say it was direct play, direct stream or transcode)? Is your transcode directory on the Banana Pi, the Raspberry Pi, or somewhere else? Where is the media coming from? Do you know what the bitrate of that 1080p video was? I understand that this will increase the rate in which you can pump out transcoded material, but I'm curious if this could help with the IO issues the Raspberry Pi faces when using NFS to read from the library and then again to write to the transcode directory. Particularly with high bitrate media (~10000kbps)

wnielson / Plex-Remote-Transcoder

Distribute chunks over slaves #53