teltek / gst-plugin-ndi

GStreamer NDI Plugin for Linux
GNU Lesser General Public License v2.1
154 stars 27 forks source link

ndisinkcombiner stops after a few seconds when used with libcamerasrc #116

Closed PMLavigne closed 10 months ago

PMLavigne commented 10 months ago

Got a bit of a weird interaction here between libcamerasrc and ndisinkcombiner. I am trying to stream the camera and a microphone from a Raspberry Pi 4 over NDI. The stream will start normally, the other end will even connect and start showing the video, but after a certain amount of time, the stream will just... stop. No errors, gstreamer's clock just stops and stays stopped. The amount of time it runs before stopping is also correlated with the data size of the stream (as in, if I try to set the camera to a lower resolution and therefore use less data per second it will run longer before stopping), which makes me think some buffer is filling up and never being flushed.

Here's the most basic command line I could come up with that exhibits the issue. Unfortunately it does require a modern Raspberry Pi with camera, as I cannot recreate this with videotestsrc:

gst-launch-1.0 -v ndisinkcombiner name=ndimux ! ndisink ndi-name="Video" \
    libcamerasrc ! \
    'video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, framerate=(fraction)30/1, colorimetry=(string)bt709' ! ndimux.video \
    audiotestsrc ! 'audio/x-raw,rate=48000,channels=1' ! ndimux.audio

If I send the libcamerasrc output to ndisink (or any other sink, for that matter) without trying to combine it with audio, it works perfectly fine. Also like I said, if I do this with videotestsrc it works perfectly fine as well. But if I do it as written above (and with GST_DEBUG="2,ndisinkcombiner:4", this happens:

Setting pipeline to PAUSED ...
[0:57:48.611038341] [1967]  INFO Camera camera_manager.cpp:284 libcamera v0.1.0+66-08e782e5-dirty (2023-10-03T10:20:34-04:00)
[0:57:48.724188542] [1975]  INFO RPI vc4.cpp:387 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media1 and ISP device /dev/media3
Pipeline is live and does not need PREROLL ...
0:00:00.224445194  1967   0x55a6e13b60 WARN              aggregator gstaggregator.c:2046:gst_aggregator_query_latency_unlocked:<ndimux> Latency query failed
0:00:00.224622802  1967   0x55a6e13b60 WARN              aggregator gstaggregator.c:2046:gst_aggregator_query_latency_unlocked:<ndimux> Latency query failed
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
0:00:00.225125385  1967   0x55a6e13b60 WARN              aggregator gstaggregator.c:2046:gst_aggregator_query_latency_unlocked:<ndimux> Latency query failed
0:00:00.225255346  1967   0x55a6e13b60 WARN              aggregator gstaggregator.c:2046:gst_aggregator_query_latency_unlocked:<ndimux> Latency query failed
New clock: GstSystemClock
/GstPipeline:pipeline0/GstAudioTestSrc:audiotestsrc0.GstPad:src: caps = audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)48000, channels=(int)1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:src: caps = audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)48000, channels=(int)1
/GstPipeline:pipeline0/NdiSinkCombiner:ndimux.GstAggregatorPad:audio: caps = audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)48000, channels=(int)1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter1.GstPad:sink: caps = audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)48000, channels=(int)1
[0:57:48.728276210] [1979]  INFO Camera camera.cpp:1181 configuring streams: (0) 1920x1080-NV12
[0:57:48.728836219] [1975]  INFO RPI vc4.cpp:544 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
/GstPipeline:pipeline0/GstLibcameraSrc:libcamerasrc0.GstLibcameraPad:src: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
/GstPipeline:pipeline0/NdiSinkCombiner:ndimux.GstAggregatorPad:video: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
Redistribute latency...
/GstPipeline:pipeline0/NdiSinkCombiner:ndimux.GstAggregatorPad:src: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
/GstPipeline:pipeline0/NdiSink:ndisink0.GstPad:sink: caps = video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, colorimetry=(string)bt709, framerate=(fraction)30/1
0:00:16.6 / 99:99:99.

Runs perfectly normally until 0:00:16.6, then just stops, and stays stopped until I Ctrl-C. Adding queue's between different parts of the stream doesn't help at all either. For the record, I'm using the latest main branch of this repo, using NDI 5.5.4, on a Raspberry Pi 4B (64 bit) running the latest Raspbian, my gstreamer version is 1.18.4, and libcamera is one I've pulled the latest from their repo and built myself.

I'd be fine with taking a crack at fixing the issue myself (especially since it seems to require specific hardware) but I'm not even sure where to start debugging something like this, hopefully someone can at least point me in the right direction.

I've also attached a more verbose log file I generated by setting GST_DEBUG="4,libcamera*:5,ndisinkcombiner:5", please let me know if there's anything else you'd like me to try, part of the code to check up on, whatever. Oh and thanks for making this plugin, I've been using it for video-only until now and it's worked flawlessly at that! :)

gstreamer-log-verbose.log

EDIT: Ok I played around with the debug logging a bit and examined the logs more closely and there's a few things I'm noticing.

This message seems to indicate a buffer is getting set to a length of 16.2 (which is pretty close to the 16.6 where the stream stops): 0:00:00.611034348 6608 0x556c07cb60 DEBUG ndisinkcombiner src/ndisinkcombiner/imp.rs:276:gstndi::ndisinkcombiner::imp:<ndimux:video> Clipping buffer Buffer { ptr: 0x556c35db40, pts: 0:00:00.383685071, dts: --:--:--.---------, duration: --:--:--.---------, size: 3110400, offset: 7, offset_end: 7, flags: TAG_MEMORY, metas: [] } with PTS 0:00:00.383685071 and duration 0:00:16.200000000

By enabling debug output of basesink I can see lines like this, which seem mighty suspicious:

0:00:00.651237230  6608   0x556c07cb60 DEBUG               basesink gstbasesink.c:3815:gst_base_sink_chain_unlocked:<ndisink0> got times start: 0:00:00.383685071, end: 0:00:16.583685071
0:00:00.651267748  6608   0x556c07cb60 DEBUG               basesink gstbasesink.c:2159:gst_base_sink_get_sync_times:<ndisink0> got times start: 0:00:00.383685071, stop: 0:00:16.583685071, do_sync 1

Then, while it's actually playing, these messages:

0:00:03.450385499  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5330:gst_base_sink_get_position:<ndisink0> res: 1, POSITION: 0:00:03.099640649
0:00:03.450409276  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5527:default_element_query:<ndisink0> query position returns 1
0:00:03.450456071  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5459:default_element_query:<ndisink0> duration query in format time
0:00:03.450508237  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5527:default_element_query:<ndisink0> query duration returns 1
0:00:03.550691503  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5408:default_element_query:<ndisink0> position query in format time
0:00:03.550722077  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5200:gst_base_sink_get_position:<ndisink0> using clock and base time 2:34:26.810170992
0:00:03.550746391  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5235:gst_base_sink_get_position:<ndisink0> in PLAYING using last 0:00:16.652395897
0:00:03.550781372  6680   0x5579e8b2f0 DEBUG               basesink gstbasesink.c:5314:gst_base_sink_get_position:<ndisink0> now 2:34:30.126148329 - base_time 2:34:26.926085658 - base 0:00:00.000000000 + time 0:00:00.000000000  last 0:00:16.652395897

And sure enough, POSITION will count up until it reaches ~16.6, stopping at exactly the value of last. So is something setting an end timestamp for some reason?

PMLavigne commented 10 months ago

I'll also note that it's entirely possible this is a bug in libcamera's gstreamer plugin, but libcamera's bug reporting system is a bit more alien to me as they use mailing lists and a bugzilla instance rather than GitHub, so I was hoping if I started here I could at least get the gist of what's up and narrow down what I need to take to the libcamera team before I have to navigate that stuff...

PMLavigne commented 10 months ago

Okay, I spent a little while debugging and I think I figured it out. I'm pretty sure it's down to a race condition between the NDI plugin and the libcamera plugin. Basically, the libcamera plugin is looking for an event to tell it that its pool of buffers has been replenished, but if that happens out of sequence with its allocator returning a frame buffer to a pool, it will ignore the event and then never try to get another frame from the camera. I was able to "fix" this by simply making it fire the event every single time a buffer is returned to the pool, rather than just when the pool is empty. I don't think this is a robust fix but it got everything working on my end, if I have some time I'll figure out a better option and submit a patch to the libcamera team.

For anyone that finds this via googling (and for my own future reference), I changed the following function in gstlibcamerapool.cpp in the libcamera project:

static void
gst_libcamera_pool_release_buffer(GstBufferPool *pool, GstBuffer *buffer)
{
    GstLibcameraPool *self = GST_LIBCAMERA_POOL(pool);
    bool do_notify = gst_atomic_queue_length(self->queue) == 0;

    gst_atomic_queue_push(self->queue, buffer);

    if (do_notify)
        g_signal_emit(self, signals[SIGNAL_BUFFER_NOTIFY], 0);
}

to simply

static void
gst_libcamera_pool_release_buffer(GstBufferPool *pool, GstBuffer *buffer)
{
    GstLibcameraPool *self = GST_LIBCAMERA_POOL(pool);

    gst_atomic_queue_push(self->queue, buffer);

    g_signal_emit(self, signals[SIGNAL_BUFFER_NOTIFY], 0);
}

Since this is a libcamera issue and not an ndi plugin issue, I'll go ahead and close this bug now. Thanks again for making a great plugin!

sdroege commented 10 months ago

FWIW, you can find a newer version of this plugin at https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/tree/main/net/ndi

PMLavigne commented 10 months ago

@sdroege Oh thanks, I had no idea. Is there a link in the readme that I just completely missed or something?

sdroege commented 10 months ago

No, someone would have to update this repository here and I don't have permissions for that :)

PMLavigne commented 10 months ago

Ah that sucks! Well thanks for steering me over there regardless!

PMLavigne commented 10 months ago

Again in case anyone stumbles on this via google and for tracking purposes, I went ahead and opened a bug with the libcamera project here: https://bugs.libcamera.org/show_bug.cgi?id=201