teltek / gst-plugin-ndi

GStreamer NDI Plugin for Linux
GNU Lesser General Public License v2.1
154 stars 27 forks source link

NDI source stalls since timestamp changes #79

Open normen opened 2 years ago

normen commented 2 years ago

Hi,

Issue description: When using a ndisrc as described in the README. the pipeline starts up but remains on time 0:00:00:000 forever, stalling the pipeline. In some random, rare cases, with uncompressed (non-HX) sources the pipeline starts to run.

After fighting with these stalling issues in the latest version for a while (both on MacOS and Windows) I found that the problem seems to exist since c98b626f69418e4cf796214e0fc7a527c5cb8364 - versions up to 7c3e50c629ca9c570502f73b996b8b816282dc1c seem to be working as expected.

Thanks for this plugin, Normen

sdroege commented 2 years ago

Can you get a debug log with GST_DEBUG=ndi*:9 and attach it here?

sdroege commented 2 years ago

Also please try different timestamp-mode settings to check if you can see a pattern there about what works and what doesn't.

normen commented 2 years ago

I already tried all timestamp modes to no avail, I'll try and create a log file.

normen commented 2 years ago

Heres the log, it's quite long, I let it run for a short while: ndilog.txt

normen commented 2 years ago

I was wrong about the timestamp mode, it does "work" when I set the autosink to sync=false and the timestamp-mode to 2 - but I'd like to have control over the latency/timing, thats why I am using this in the first place, so sync off is no proper solution..

sdroege commented 2 years ago

Thanks, I'll take a look through the log later. sync=false should work with any timestamp-mode because then timestamps are just going to be ignored :)

normen commented 2 years ago

Well, it doesn't - it only works with mode 2 - maybe thats a hint? Thanks for looking into it!

sdroege commented 2 years ago

How does your pipeline look like? Maybe a full GST_DEBUG=9 would be useful here...

normen commented 2 years ago

Its a gst-launch pipeline exactly as in the README.

normen commented 2 years ago

Heres a full log, 16MB.. ndilog.txt

sdroege commented 2 years ago

In your full log you shouldn't have seen any video at all and also heard no audio.

You need to increase the queue sizes here or otherwise this won't work. The first audio arrives at 0.65s, the first video at 1.94s, but the queues only hold up to 1s in your case.

Try configuring both queues with max-size-bytes=0 max-size-buffers=0 max-size-time=5000000000 (5s).

normen commented 2 years ago

Correct, I didn't see or hear anything - these logs are from the exact gst-launch pipeline in the README using the official "NDI HX Camera" app on iPhone/Pad as a source.

Increasing the max-size-time on both queues to 10s didn't really solve the issue - the video starts playing but after a few seconds I get the "dropped buffer" messages again, the audio is very choppy if its playing at all. I also get a lot of Too small/big slope errors..

normen commented 2 years ago

Funnily, in a different context (obs-gstreamer plugin) the queue seems to work when using timestamp-mode=2 - no additional buffers, no different settings, even when the appsink is synced...

However I noticed that the pipeline doesn't seem to react when setting an overall latency on the pipeline - this usually works for USB and RTSP sources to delay/sync them. The NDI pipeline always comes in with the same latency, no matter what I set for the latency, like this:

    gst_pipeline_set_latency (GST_PIPELINE (data->pipe), latency * GST_MSECOND);
    gint cur_latency = gst_pipeline_get_latency(GST_PIPELINE(data->pipe)) / GST_MSECOND;
    blog(LOG_INFO, "Set latency for pipeline to %d", cur_latency);
sdroege commented 2 years ago

Increasing the max-size-time on both queues to 10s didn't really solve the issue - the video starts playing but after a few seconds I get the "dropped buffer" messages again, the audio is very choppy if its playing at all. I also get a lot of Too small/big slope errors..

Right, that's the situation I would be interested in having logs for :)

If you set timestamp mode to receive-time then this shouldn't happen btw.

However I noticed that the pipeline doesn't seem to react when setting an overall latency on the pipeline - this usually works for USB and RTSP sources to delay/sync them. The NDI pipeline always comes in with the same latency, no matter what I set for the latency, like this:

That's a bad idea unless you know that the pipeline's latency is going to be lower than what you're configuring on it. This API should only be used if you want to do inter-device / inter-pipeline synchronization.

normen commented 2 years ago

That's a bad idea unless you know that the pipeline's latency is going to be lower than what you're configuring on it. This API should only be used if you want to do inter-device / inter-pipeline synchronization.

Sure, as you explained in your great presentation :) However I should be getting an error/warning from the pipeline if thats not the case, right? I am using this to sync up multiple live camera sources (RTSP, USB, NDI) in OBS and it works a treat - except for NDI.

normen commented 2 years ago

Heres a log with max-size-time at 10s, only GST_DEBUG=ndi*:9 as the log gets too big with a full debug: ndilog.txt

normen commented 2 years ago

Heres a full log with GST_DEBUG=6, hope thats enough. ndilog.txt

sdroege commented 2 years ago

Sure, as you explained in your great presentation :) However I should be getting an error/warning from the pipeline if thats not the case, right? I am using this to sync up multiple live camera sources (RTSP, USB, NDI) in OBS and it works a treat - except for NDI.

If the appsink in the end is sync=true, check in the debug logs when the buffers are arriving, what latency is actually configured on the appsink and what it's doing with regards to clock waiting. My guess is that either they're all arriving late, or there's a latency configuration problem on the application-side.


For the problem at hand here, can you confirm that setting timestamp-mode=receive-time also does not solve it for you?

normen commented 2 years ago

If the appsink in the end is sync=true, check in the debug logs when the buffers are arriving, what latency is actually configured on the appsink and what it's doing with regards to clock waiting. My guess is that either they're all arriving late, or there's a latency configuration problem on the application-side.

I am syncing the cameras visually by filming the same metronome - when I change the pipeline latency value in a USB or RTSP pipeline the visual result is as expected - the camera comes in later or sooner. Except for the NDI pipeline which even with 10 seconds set as latency doesn't change its "real world" latency in the pipe.

When setting a pipeline latency that is too low on the RTSP, USB or NDI sources I immediately get a clock error with the attached debug description explaining that the items in the pipeline have a higher latency than what is configured. The NDI pipeline always seems to report an internal latency of 20ms, so when I set a latency of 10ms I get this:

12:03:14.552: GStreamer error: clock problem.
12:03:14.552: Additional debug info:
12:03:14.552: ../gst/gstpipeline.c(712): gst_pipeline_do_latency (): /GstPipeline:pipeline7:
12:03:14.552: Configured latency is lower than detected minimum latency: configured 0:00:00.010000000 < min 0:00:00.020000000

For the problem at hand here, can you confirm that setting timestamp-mode=receive-time also does not solve it for you?

Yes, the problem persists with timestamp-mode=receive-time - I think you should be able to reproduce it with the "NDI HX Camera" for iOS as I am experiencing it on both MacOS and Windows with that app.

sdroege commented 2 years ago

Except for the NDI pipeline which even with 10 seconds set as latency doesn't change its "real world" latency in the pipe.

As said, check the appsink/clock logs :) This is a different problem to what this issue is about though.


I think you should be able to reproduce it with the "NDI HX Camera" for iOS as I am experiencing it on both MacOS and Windows with that app.

If you send me an iOS device, sure ;) I was testing with the NDI HX Camera app on Android with Linux as a receiver and that seems to be working correctly here.

I'll check if there's something useful in your logs later.

normen commented 2 years ago

Except for the NDI pipeline which even with 10 seconds set as latency doesn't change its "real world" latency in the pipe.

As said, check the appsink/clock logs :) This is a different problem to what this issue is about though.

Heres a level 6 log of OBS running with only one Gst-NDI source, I can't find any hints that setting the latency to 250ms fails, still its not applied in reality:

obslog.txt

sdroege commented 2 years ago

Like I said, everything's arriving too late in the sink

0:00:10.435205900  8364 000001E7B475C2C0 DEBUG               basesink gstbasesink.c:2785:gst_base_sink_do_sync:<video_appsink> clock returned 1, jitter  0:00:07.335453700

Means that this frame arrived 7.3s too late. As such, they will just be rendered immediately.

That might be a hint to the underlying problem here. Wrong timestamps in some sense.

sdroege commented 2 years ago

Well yes, all frames have a PTS of 0.

normen commented 2 years ago

But wheres that time difference coming from? Its by far not coming in 7 seconds late in reality. I can see the video, it's not 7 seconds late.

sdroege commented 2 years ago

Wrong timestamping.

normen commented 2 years ago

Okay, so based on the fact that I only get video from the "NDI HX Camera" when using timestamp-mode=2 (timecode) and that the video timecode then is 7 seconds off somehow the only question that remains is why doesn't it work when using receive-time, that should rule out timecode problems on the sender side, right?

Its certainly not the first time that I see clock/sync issues with NDI, even before I added gstreamer to the mix, so I won't rule out that this is an issue with the NDI runtime (using the latest V5 SDK on both windows and mac). But I only ever saw these extreme time differences when using gstreamer with it. With the gst-launch pipeline with 10s buffer I actually saw/heard the 7 seconds time difference between audio and video in some cases - in some not.

sdroege commented 2 years ago

If you check the logs, all the NDIlib_audio_frame_v3_t and NDIlib_video_frame_v2_t have a timecode of 0. So if you use any of the timecode-based modes it's not going to work very well. We should probably detect this case though.

If you select a timestamp or receive time based mode then it should work though. Can you give me a log for timestamp-vs-receive-time and another one for receive-time?

normen commented 2 years ago

Heres two logs, one with mode 0 and one with mode 4, neither yield an image in OBS, the previous was with mode 2, the only one that works somewhat. obslog-mode-4.txt obslog-mode-0.txt

sdroege commented 2 years ago

Thanks, I meant mode 1 (timestamp-vs...) and not 0 (timecode-vs...) though, can you also attach that? 0 is also useful however.

Mode 2 can't really work if the NDI sender is passing 0 for all timecodes. And generally (unless you really know what you're doing), you want either mode 0, 1 or 4 :)

normen commented 2 years ago

Right, sorry. Well if mode 2 is the only one that yields an image expect me to use that ;) obslog-mode-1.txt (Didn't show an image)

sdroege commented 2 years ago

Right, sorry. Well if mode 2 is the only one that yields an image expect me to use that ;) obslog-mode-1.txt (Didn't show an image)

You didn't configure the queues to be large enough again. First audio arrived at 6.96s, first video at 8.31s and the queue in front of the audio sink ran full after 7s.

Can you get a new set of those 3 logs with big enough queues?

normen commented 2 years ago

Okay, I don't get how thats necessary for internal receive times but here goes.. obslog-mode-4-10sbuf.txt obslog-mode-1-10sbuf.txt obslog-mode-0-10sbuf.txt

sdroege commented 2 years ago

It's necessary so that both sinks actually receive data, consume it continuously and so I can see how the timestamps are in relation to the pipeline clock.

The mode 4 log (the first one), you still don't get any video/audio out of there? All video/audio frames arrive in time at the sinks there and it looks all ok. At least starting around 8.3s into the log, before that frames are too late and then slowly catching up. Which seems to be related to audio being available much before video. Needs some more analysis.

Mode 1 log should behave approximately the same.

Mode 0 log is a bit strange and needs some more analysis but it looks like data is arriving in approximately 1s batches, so a higher latency would be necessary to configure here.

I'll take another look in the next days, this needs some time.

normen commented 2 years ago

Sorry, forgot to mention that yes, I do get an image with all of these settings, audio is a bit choppy at times and it is quite late (not 7 seconds though),

As I am really only interested in the video from the cameras (the microphones come in via a mixing desk) - are you saying I can avoid the issues by only using the video part of the demuxer pipeline (as there would be no syncing up)?

sdroege commented 2 years ago

I finally got to look at these logs and trying to look into fixing the whole situation. I'll take your logs as one of my testcases for a more robust algorithm for the timestamps.

One thing that seems problematic in your log is

  1. timecodes are all set to 0 (OK, we should be able to handle that)
  2. timestamps between audio and video are apart 1.3s in the beginning, and audio timestamps are the same for all frames. This suggests a sender bug and we should try to detect this situation too somehow and fall back to using plain receive times here.

Figure_1

normen commented 2 years ago

Okay, cool. The Sender is the official NDI HX Camera app for iOS in case it matters.

sdroege commented 2 years ago

Yeah I heard a few times already that it is apparently broken. The Android version of the app is behaving correctly.

normen commented 2 years ago

I guess it might also be a matter with the NDI core libraries on the "darwin" platforms.. I also get strange behavior when I try to display a NDI screen grab from an iMac.

sdroege commented 2 years ago

This should be fixed by https://github.com/teltek/gst-plugin-ndi/pull/102 with the default settings now

normen commented 2 years ago

Cool, it'll be some time until I can check but I'll report back!

normen commented 1 year ago

So, coming back to testing this and on sources that have no audio the latest version works a treat. For sources with audio not so much, I couldn't get it to work with actual NDI hardware, or the NDI HX iPhone app or with the new NDICam App or even with a screen capture session. It always gets stuck at 00:00:00 and seems to drop pretty much all frames.

Heres a log with the new NDICam App: log.txt

Edit: Also, I get errors when I don't grab the audio from the demuxer. I didn't know this kind of behavior from gstreamer, usually when I don't connect an audio sink the audio frames simply get dropped (e.g. RTSP).

normen commented 1 year ago

D'oh, I just realized your PR wasn't merged yet, checked out your branch now and it works great, thanks a lot!

sdroege commented 1 year ago

Thanks for testing :)

sdroege commented 1 year ago

I've integrated the fix here now: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/tree/main/net/ndi

normen commented 1 year ago

Hmmm.. It looks like I have to come back to this. With the latest version from the gst-plugins-rs I still get the described issues. Every time audio gets in the mix the stream stalls. When I have a source without audio everything works fine. One issue is that I can't drop the audio packets in a pipeline that has audio, I have to pick up the audio pad or I get a pipeline error. Usually in gstreamer I don't have to do that?

Also I get mixed up with all the versions.. It seems all versions that have the ndisrcdemux element show this behavior. The version from your private repo that has separate ndivideosrc and ndiaudiosrc elements works best for me. I am currently using that to stream my NDI cameras.

sdroege commented 1 year ago

Can you report that over there with the exact pipeline you're using and a GST_DEBUG=6,ndi*:9 debug log? That should help finding the cause. Like this there's no actionable information :)