Open spcqike opened 5 years ago
Hi, sorry for the late answer.
Could you use multiple watch folders (one per node)? Then I guess you could move some logic into how files are copied to the watch folders. For example, you could put a file into an empty watch folder, or into the watch folder with the least amount of files.
hello,
no, thats not possible, because the nodes get the same watch folder. afaik its not possible to use different volumes/mounts/ports/... for different nodes.
as the transcoding is working (at least for the second (and following) file), i didn't spend much time on further investigation.... but i saw, that everything works as expected if i use small files.
e.g. if i put 3 files < 50MB into the watch folder, every node moves one into its .temp folder. So i guess, the problem with larger files is the time it needs to move the file. (watch folder and out folder are on the same network storage)
when the current transcoding task is done, i'll try to edit the code, that it just renames the file within the watch folder (put .tmp or something at the end) and make some logic, to skip other files with that ending. this should be faster than moving it.
it would be nice, to have an enviroment variable to enable/disable swarm mode with such changes :) do you think this is possible?
Hi! Im facing the same problem but with cattle. But, I think I can modify the starter script so it will store a file like "successful_conversions" but named "inprogress_conversions" and, with the use of a shared storage like NFS, all the nodes will know if a file is being converted by another instance. However, as I'm writing this I can see that this can be a race condition between all nodes and sometimes one or another is going to duplicate some video.
Maybe some "sleep $RANDOMSECS" just before the checking function... Dunno
Hello,
As I wrote I already have a “sleep randomsec(1-10)” function. However this just works, if the file is moved really quick.
I have some others ideas of how to improve swarm management, but unfortunately I don’t know how to script that and atm i don’t have the time to try fail and learn :) (my 2yo won’t let me work ok that)
Maybe someone has some ideas or can try this :)
Here my thoughts:
I think with something like this, the swarm should be scaleable just fine.
Some extras would be:
I guess it’s a lot of work, if it’s even possible to do something like this ... maybe someone has some ideas or is able to give some input :)
Right now I have just one more problem: docker swarm itself. Because I want to change the transcode profile I need to be able tobrwduce the replicas to 1. after all changes where made an increase it back to 3. it sometimes happen, that I have 2 or even all 3 swarm nodes on one VM and therefore on one host. So I need to wait for docker 19.03 that I can set „max replicas per node“ to 1 :)
I believe my issue is similar so I am adding comment here instead of opening a new issue. I have successfully set up a single-instance container under Kubernetes and it's working as expected. I would like to have multiple instances of jlesage/docker-handbrake (and the autovideoconverter) running in a ReplicationController (or whatever Kubernetes is calling that construct now) against the same shared-filesystem watch/output/etc folders.
My first thought was to implement pre- and post-conversion hooks that set/clear locks for the source file it's being passed, but in order for that to work, the pre-conversion hook would need to abort the conversion if it returned a nonzero exit code. That doesn't appear to be the case.
I'm happy to work on this and submit a PR, but if someone has thoughts on the best way to implement this kind of function, I will head in that direction. Doing it my way would require adding a "return" to where the pre-conversion hook is called in /rootfs/etc/services.d/autovideoconverter/run -- I think.
Pre-conversion:
if test -f "$SOURCE_FILE.lock"; then exit 1 else echo "no lock detected for $SOURCE_FILE, proceeding." touch "$SOURCE_FILE.lock" exit 0
Post-conversion:
if test -f "$SOURCE_FILE.lock"; then rm "$SOURCE_FILE.lock"
... either inside or outside of the test for CONVERSION_STATUS depending on what behavior is desired. Seems like if I put it outside that block, and the conversion fails, you'd just wind up having multiple nodes trying again and again and continuing to fail... unless adding that file to config/failed_conversions takes care of that.
A bit late to the party, but I wonder if using a central machine to read the files, in conjunction with post processing hooks and preprocessing hooks would benefit this idea. I am trying to move files before I initiate processing to put them in a _processing directory and then move them to a _processed on completion. It seems like handbrake doesn't scan directories recursively (or at least fails for me) so this could work in theory?
Hi there,
I’m trying to set up a Docker swarm to use multiple server for rendering.
I have the problem, that each node starts to encode the same file. So with two nodes, all files are encoded twice.
I wonder if there is a possibility to move a file out of the watch folder to a temp folder before transcoding. So that the file isn’t accessible for other nodes anymore. If the encode fails, the original file should be moved back to the watch folder.
Do you have any ideas for that?
Thank you
E1: After some trail and error I managed to move the video file into the temp directory. At the end the directory will be removed so the file will be removed to. So it’s not possible to keep the original file.
I just have one problem left: Because all nodes start at the same time, they will all get a copy of the first file and they will all transcode it. Because they run on different hosts with different cpus, they have different transcoding speed. So when the first node finishes the first file, it will start (and remove) the second file. So when the second node finishes the first file, the second file is already removed and it starts the third file.
But if I want to transcode just one file, all nodes will start to transcode. Even if one would be sufficient. ...
I tried to add a random sleep at the beginning of the script, but this won’t fix the problem. :(