zoriya / blog

My personal blog
https://zoriya.dev/
0 stars 0 forks source link

blogs/transcoder/ #1

Open utterances-bot opened 1 month ago

utterances-bot commented 1 month ago

The challenge of writing a on-demand transcoder · Zoe's blog

https://zoriya.dev/blogs/transcoder/

zoriya commented 1 month ago

First test of utteranc.es

hugosenari commented 1 month ago

Great writing, I learned a lot. I work with streams but not at encoders level.

So why Tears of Steel freezes between 2nd and 3rd segment on Firefox? The same doesn't happen on Chrome.

The freeze isn't permanent, when 4th segment starts, also sound works.

zoriya commented 1 month ago

I can't reproduce the freeze with Tears of Steel (I tested on https://kyoo.zoriya.dev). It did happen a lot at the beginning when cuts between segments were off by a few ms, but this is now fixed.

hugosenari commented 4 weeks ago

Strange, it was so easy to reproduce that I had chance to reload page many times, and take a screen capture. (sorry about the site): https://www.4shared.com/s/fqqjPjtVHjq

But now is good even for me, I was expecting something interesting to learn from this failure, maybe it was only a ffmpeg or connection failure that corrupted the segment. Do you have any kind of cache?

zoriya commented 4 weeks ago

Yes, there is a cache of 4 hours to cache video segments. When two persons request the same video/quality, Kyoo runs ffmpeg only if needed (like if the first person tries to play at 5min while the second at 15min). Segments are then cached for every user.

I have not talked about this in the blog, but most of the work of the transcoder is handling state, ensuring segments are cached and retrieved in the right order. Users can first request segments 100 to 120 and then request segments 65 to 120. The transcoder is able to run ffmpeg for segments 65 to 100 and then use cached segments for 100 to 120.

I guess ffmpeg got a woopsy when you accesed it, and it got cached.

szatmary commented 3 weeks ago

"Users can first request segments 100 to 120 and then request segments 65 to 120" With a large number off users this will become problematic. Unless you handle b-frame delay and priming samples special there could be non monotonic DTS and audio pops at segment 120.

szatmary commented 3 weeks ago

I meant segments 65

zoriya commented 3 weeks ago

I'm not sure how the number of users relates to the issue you described. Having more users play the same file (at different times) simply means the file will be processed in parallel.

Audio pops did happen when I implemented this and I found a very simple workaround. I always generate a 'dummy' segment that is never used but ensure ffmpeg has the right context. i.e.: when I want to start encoding from segment 4 to 9, I'll instead encode from segment 3 and discard the first.

Audio handling still needs to be improved. For now audio is always re-encoded and does not have multiple variants. I'd like to include an "original" option as well as re-encoded ones.