jc-kynesim / rpi-ffmpeg

FFmpeg work for RPI
Other
107 stars 26 forks source link

PTS issue #88

Open rellla opened 3 months ago

rellla commented 3 months ago

Hi jc-kynesim,

i'm using https://github.com/jc-kynesim/rpi-ffmpeg/tree/dev/6.0/rpi_import_1 on amlogic and have some problems on TV-streams which are decoded by h264_v4l2m2m. Some are fine, but some have sync problems due to wrong/missing/false-ordered pts in the output frames.

I tried to get some debug logs out of ffmpeg (though i skipped some av_log lines) and here is what i get:

https://pastebin.com/raw/wkF4Z3U9

There is a Frame tracking failure: pts=24986, track[26]=27162 reported which results in a AV_NOPTS_VALUE and afterwards in sync problems of the app.

Tracking 27162 i can see, that there is the in-pkt which goes to hardware:

In pkt PTS=4353182236, DTS=-9223372036854775808, track=27162, n=27162
--- output pre VIDIOC_QBUF: index 0, ts=0.027162 count=0
--- output VIDIOC_QBUF: index 0, ts=0.027162 count=1

We also get it as Out frame PTS=4353182236/-9223372036854775808, DTS=-9223372036854775808, track=27162, n=26 and --- capture VIDIOC_QBUF: index 10, ts=0.027162 count=7 but the frames is still tracked and i get the Frame tracking failure and another Out frame PTS=-9223372036854775808/-9223372036854775808, DTS=-9223372036854775808, track=27162, n=26, but now with an AV_NOPTS_VALUE because xlat_pts_frame_out sets it.

If i understand correctly, 27162 shouldn't be there still, because i got it already? Can you explain, what is happening here and where the problem may come from? Thanks for your help.

jc-kynesim commented 3 months ago

OK - What I think is going on is that frames are being added faster than they are being consumed. The tracking failure you observe is caused by the circular tracking buffer overflowing.

The bit of logging that you should be worried by is the increasing values of "n=" which is the number of buffers that the code thinks are queued inside V4L2 - numbers >30 are a cause for concern under most circumstances, >100 is an indication of disaster pending. This can occur in a couple of ways, the most likely of which is that the firmware can't quite keep up with the bit / framerate. You can also generate this by not freeing frame buffers in a timely fashion maybe caused by playback being slower than reception on the wire (assuming a broadcast-style environment where you can't block).

After the above there is also the non-zero possibility that there is a bug; and I am a bit confused by some of what appears to be happening to that frame so I'm definitely not discounting it. Do you have a stream that shows this bug that you can give me?

rellla commented 3 months ago

Thanks for your answer. Atm i can't provide a stream snippet, maybe in a few days... My setup is a LibreELEC installation with www.tvdr.de on top. I'm receiving DVB-S2 TV-streams. What I'm wondering is, that the german TV-channel Das Erste HD seems to work quite flawlessly with nearly no problems and ZDF HD makes this problems, though as i can see it has a quite higher bitrate...

What/where exactly is the "number of buffers"-value you mentioned? Just to understand it right... Is it the "n=..." or "count=...." in my logs? Afaiu the "n=..." is the place in the circular tracking buffer? I'm confused about the different n-values in the "In pkt" and "Out frame" log. Shouldn't they match?

I can have a look again, what happens with my frame buffers in the app. Atm the decoded av frames go into a ringbuffer and the display thread takes them out again an should free them after the next pageflip. Playback speed shouldn't be the problem, because the 50fps stream matches with a 50Hz display and if there are problems with pts/sync, frames are dropped (and freed).

What I also found now in the log, is that i have meson-vdec ff620000.video-decoder: Buffer 9 done but can't match offset (06AD3D06) which comes from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/meson/vdec/vdec_helpers.c#n388 - but i didn't look into the kernel yet.

Would be nice, if you can point me into the right direction ;)

jc-kynesim commented 3 months ago

You are right - I'm talking rubbish, count= is the number I'm thinking of. This definitely looks like a bug so a clip would be really helpful. I don't know anything about how your tv service plumbs in - is it a direct pipe or does it go via UDP? If the latter then a pcap would plausibly do if I can still find my old tools for taking those apart.

rellla commented 3 months ago

I can see, what i can do. It's a raw MPEG-TS stream with h264 in it. Would that be ok?

I have a 14,1 GB recording here which seems to trigger that things, but i think i should cut that one ;) I can do that later.

jc-kynesim commented 3 months ago

Should be fine. FFmpeg should consume that directly and if it doesn't then I did a LOT of STB work back in the day and can probably still find my tools for verifying those and taking them apart.

rellla commented 3 months ago

Quick update: It seems, that RPI4 doesn't have these problems, so maybe the issue is kernel related, because userspace app and ffmpeg is the same for both ...

jc-kynesim commented 3 months ago

So what is the failing platform? Pi5 doesn't have a h/w h264 decode so the trace can't have come from that.

rellla commented 3 months ago

It's an Odroid N2+ with Amlogic S922X

jc-kynesim commented 3 months ago

Ah... that'll make it harder for me to debug, and definitely increases the chance that the issue is kernel / firmware related. Nonetheless I'll have another look at your trace, compare the bits that seem a bit odd with the code and see if there is anything obvious.

rellla commented 3 months ago

Thank you!

rellla commented 3 months ago

Btw, if you need a more complete log, feel free to ping me...

rellla commented 3 months ago

I had a look into the kernel bits and found https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/staging/media/meson/vdec/vdec_helpers.c?id=876f123b8956b455a89a172b905f9ecbb6fc5b67 which seems related to the issue. There are offsets, which slightly differ and aren't found anymore. I have to do some more investigations...

rellla commented 3 months ago

Update: This most likely is a kernel bug:

https://pastebin.com/raw/4XDz6xDF

seems to fix it. So I think, we can close this issue.

I need to find the right path to get a fix upstream...