livepeer / lpms

Livepeer media server
MIT License
280 stars 70 forks source link

Single IDR frame segment as input #311

Open AlexKordic opened 2 years ago

AlexKordic commented 2 years ago

Noticed in ZeroSegments error issue.

The output should be single frame file. Now we have 0-size output.

Sample: 7b328dfa-0d9a-4ab2-97a3-c6c643e1dbdc-954.ts-1647595505425.txt

yondonfu commented 2 years ago

Closed by #313

thomshutt commented 2 years ago

Reopening as the fix for this had to be rolled back

MikeIndiaAlpha commented 2 years ago

I don't have time to work on this (yet), but I was thinking a bit and here's the note to self:

Now we know that we have problems with flushing (from FFmpeg level) on some HW, and the workaround method is to re-feed first packet. Which in the case of first frame NAL may fail to trigger decoding logic because is again slice NAL of the same frame! Possible workarounds:

MikeIndiaAlpha commented 2 years ago

Another note to self:

MikeIndiaAlpha commented 2 years ago

Started working on this one today. I want to get to know the problem using version that I know best - that is, ma_refactoring_second branch. When I get to fully understand underlying mechanism I'll consider solving it on earlier version so it can be distributed sooner.

So far it seems to me that the frame passes the whole pipeline (at least I can see that not-so-small video packet gets added to the muxer) BUT the PTS of that packet is huge. And when I open the container file I can see that it has 25+minutes of content, despite no frames showing in ffprobe. So I will continue investigation with hypothesis that timing goes south...but will try to verify it asap.

MikeIndiaAlpha commented 2 years ago

To verify my hypothesis I first did: ./transcoding 7b328dfa-0d9a-4ab2-97a3-c6c643e1dbdc-954.ts P360p30fps4x3 sw Note that resulting out_sw_0_out.mp4 is not playable by ffplay - nothing is shown or by MacOS "space bar" video preview. But when I do: ffmpeg -i out_sw_0_out.mp4 -vcodec copy -an -bsf:v h264_mp4toannexb test.h264 (in effect extracting all video frames into raw h.264 in Annex B form into test.h264) ffplay test.h264 WILL display the file and the result is as expected - single frame just like when ffplaying original .ts file, but scaled down. So it seems to me that 1) From video POV frame is processed correctly... 2) ...but it get assigned some crazy PTS while muxing or some such.

MikeIndiaAlpha commented 2 years ago

Ah - ok. Important thing. The above is on ma_refactoring_second branch. On the current master it works 100%. So it looks like first stage of refactoring changes fixed the flushing, but then second part of refactoring screwed muxing a bit, so it doesn't work for all muxing formats (despite video frames being there). Investigation reveals that commit 0c60e3d “Encoder creation logic moved” still works, and 47b07 “Muxer streams refactoring” doesn't.

The task is as follows, then: fix the change between 0x60e3d and 47b07!

MikeIndiaAlpha commented 2 years ago

OK, I found it. There were several errors, but what was doing mp4 in is failure to assign the frame rate. Relevant commit is fixed now and it should work in all cases.

MikeIndiaAlpha commented 2 years ago

To make sure everything is fine I moved back a bit. At least for the stream under the test, problem was fixed by the commit a83e95 "stream_index instead of pkt usage", which also contains flushing rework. Previous attempt to solve the problem by @AlexKordic also contained flushing changes, so it all makes sense to me.

Right now - like I said - a83e95 is a part of master, so using current version of LPMS should protect against the problem.

MikeIndiaAlpha commented 2 years ago

For the record, I was using that stream for tests. 7b328dfa-0d9a-4ab2-97a3-c6c643e1dbdc-954.ts.zip

MikeIndiaAlpha commented 2 years ago

So it appears that while frequency of ZeroSegment errors is smaller now, they still appear. Reopening the issue and keeping it assigned to myself so that I can iron our other problems (I suspect there are several reasons for this bad behaviour and I only fixed one so far). I'll be grateful for the streams that make LPMS fail, it will make debugging so much easier.

MikeIndiaAlpha commented 1 year ago

Finally managed to get some new ZeroSegment errors! And I don't wanna jump into conclusions, but my first impression is that we might have a bug in FFmpeg demuxer for .TS stream. Thing is, neither LPMS nor FFplay can open or play these videos without problems ...but preview available on space bar on my Mac will! Obviously, demuxer problems explain such behaviour well...but I am going to investigate further.

Attached below are original files (.txt versions) and extracted video files that play on my Mac all right (.zip because Github won't take .ts) 7374ffcc-d831-4de5-9bb9-3415bcab6d0d-845.ts-1660062861415.txt 7374ffcc-d831-4de5-9bb9-3415bcab6d0d-852.ts-1660062873102.txt a5cc0002-7adc-4d5b-a1f6-a39f2ad9a0ad-1453.ts-1660079968471.txt 7374ffc--845.ts.zip 7374ffc--852.ts.zip a5cc0002--1453.ts.zip

MikeIndiaAlpha commented 1 year ago

WRT to previous comment: 7374ffc--845.ts: transcoding won't work in sw mode (mpeg ts complains of packet corruption, and transcoder panics because it sees no keyframes in input). Same file is playable with macOS "space" preview without problems. mplayer will play back with some complaints, but not sure what about are these. When raw h.264 stream is extracted using mplayer, ffplay won't play it and ffprobe -show_frames will show nothing, BUT mplayer will still work on it!

7374ffc--852.ts: transcoding won't work in sw mode (no complaints from mpeg ts but transcoder panics because it sees no keyframes in input). Same file is playable with macOS "space" preview without problems. mplayer will play back with some complaints. Again situation is the same with extracted raw h.264 stream, interesting.

a5cc0002--1543.ts: transcoding will work in sw mode (mpeg ts complains of packet corruption, but transcoder works). Same file can be viewed from macOS. mplayer will work with complaints. Raw h.264 stream can be extracted using mplayer or ffmpeg and both of these will play fine using ffplay or show frames in ffprobe though interestingly enough lengths do differ (it doesn't have to be a bug).

Both 7374ffc--*.h264 files make ldecod from reference software package crash. There is little that can be extracted from there because crash is a bit early. An interesting thing - ldecod also fails to see IDR slices...

Update: as an alternative I have used TSduck's tsp to dump the video, for example: tsp -I file 7374ffc--852.ts -P pes -p 255 --payload -o output.txt -O drop then removed first two lines out of output.txt manually, and: cut -d' ' -f6-22 output.txt | xxd -r -p > output.h264 This works for "a5cc00002--1543.ts" all right, but produces same extracted h.264 stream - not playable on ffplay, crashing ldecod, etc.

TODO: check transcoding in Nvidia HW mode for a5cc0002!

MikeIndiaAlpha commented 1 year ago

Worked some time on this and have educated guess of what is happening (the "not playing on ffplay" part). Actually this looks like the .h264 stream in question has no IDR picture at all. Which means the stream is not compliant, because standard says "The first access unit of each coded video sequence is an IDR access unit." (7.4.1.2.2 Order of access units and association to coded video sequences). And yet first frame is an I-frame, so some h.264 decoders probably decide to treat it as an IDR frame, and then stream plays correctly, while other h.264 decoders (along with ldecod) won't play or will simply crash. So it could be a good explanation of players behaviour - these that tolerate "no IDR frame" streams will play it, others simply cannot and it is actually standard-compliant way.

What we could do about it (provided the theory above is true) - well, we control segmenter, and it might be that segmenter checks for keyframes and not for IDR frames. I believe that in h.264 any I-frame is a keyframe, but not all I-frames are IDR frames. So basically applying old MPEG-1/2 style GOP structure to the "open" structure of h.264 stream.

Went brute-force on that, and manually changed NAL type of first slice in --852.h264 video from non-IDR I frame into IDR frame. This is really brutal, because IDR and non-IDR frames have a bit different syntax, and so image was damaged - but the stream displayed with ffplay. Also, ldecod now assertion fails, and not crashes. Okay, we are onto something.

MikeIndiaAlpha commented 1 year ago

OK, after lengthy investigation on the stream source side (not described here because not part of LPMS) we know how to "fix" the offending streams and when fed those, LPMS works. Attached is the stream that was made by "fixing" one of the streams attached before (7374ffc--852.h264). It can now be played with ffmpeg and transcoded with LPMS all right.

I want to check out the "other" stream (a5cc0002--1543.ts) on Nvidia transcoder as well before closing the issue, but generally speaking, progress! test.mp4.zip

MikeIndiaAlpha commented 1 year ago

Tested the a5cc0002--1453.ts stream on both SW and NV transcoder, and every time there were absolutely no problems with neither transcoding nor playback of the produced file. Reviewing the file structure shows absolutely correct h.264 stream, beginning with IDR frame as it should.

MikeIndiaAlpha commented 1 year ago

So to sum things up - once segmenter will get fixed this issue can be closed. Also the title is misleading anyway, because remaining ZeroSegments errors are when there is no IDR frame in segment.

MikeIndiaAlpha commented 1 year ago

The fixes for the Mist server segmenter are prepared in this PR: https://github.com/DDVTECH/mistserver/pull/95 Once that will get merged, Fixer needs to be called from segmenter (this is not done yet), which will be done by the Mist Team. This should fix the problem.