gstreamer-java / gst1-java-core

Java bindings for GStreamer 1.x
GNU Lesser General Public License v3.0
188 stars 72 forks source link

Reliability-Problems around Pad.block() #184

Closed MaZderMind closed 4 years ago

MaZderMind commented 4 years ago

I think I might be onto a race condition in Pad.block(). I'm not yet able to reproduce the problem outside of my application, but i can reproduce it about 50% of the time there.

The relevant Code-Section looks like this:

Pad teePad = …;
CompletableFuture<Void> future = new CompletableFuture<>();
log.info("requesting block of teePad {} of {} - isBlocked={} isBlocking={}",
        teePad.getName(), tee.getName(), teePad.isBlocked(), teePad.isBlocking());

GstDebugger.debugPipeline("before-block", pipeline);
teePad.block(() -> {
    log.info("blocked teePad {} of {} - isBlocked={} isBlocking={}",
            teePad.getName(), tee.getName(), teePad.isBlocked(), teePad.isBlocking());

    // release pad here

    future.complete(null);
});

log.info("after requesting block of teePad {} of {} - isBlocked={} isBlocking={}",
        teePad.getName(), tee.getName(), teePad.isBlocked(), teePad.isBlocking());

GstDebugger.debugPipeline("after-block", pipeline);
future.get();

A src-Pad of a tee is blocked in order to correctly release it. If the pad is currently idle, one would expect the Runnable-Closure to be called immediately, otherwise one would expect the call to .block to complete and the callback being called shortly thereafter.

50% of the time this work and the logs look like this:

[main] INFO de.mazdermind.gintercom.mixingcore.Panel - requesting block of teePad src_0 of rx-1 - isBlocked=false isBlocking=false
[main] INFO de.mazdermind.gintercom.mixingcore.Panel - after requesting block of teePad src_0 of rx-1 - isBlocked=true isBlocking=false

(gst1-java-core:14957): GStreamer-CRITICAL **: 18:27:10.478: gst_pad_probe_info_get_event: assertion 'info->type & (GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM | GST_PAD_PROBE_TYPE_EVENT_UPSTREAM)' failed
[GstBus] INFO de.mazdermind.gintercom.mixingcore.Panel - blocked teePad src_0 of rx-1 - isBlocked=true isBlocking=false

(gst1-java-core:14957): GStreamer-CRITICAL **: 18:27:10.480: gst_pad_probe_info_get_event: assertion 'info->type & (GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM | GST_PAD_PROBE_TYPE_EVENT_UPSTREAM)' failed

aside from the assertation, which is a different issue and probably not harmful, one can see that the callback has been called from the GstBus thread and that right after the request to block, the pad is already blocked.

The other 50% of the time the logs show, that the pad is still being blocked, but the callback never gets called:

[main] INFO de.mazdermind.gintercom.mixingcore.Panel - requesting block of teePad src_0 of rx-1 - isBlocked=false isBlocking=false
[main] INFO de.mazdermind.gintercom.mixingcore.Panel - after requesting block of teePad src_0 of rx-1 - isBlocked=true isBlocking=false
…
(Timeout in Test-Runner after some Minutes)

The dot-File-Diagrams show now surprises; the Pad is displayed as blocked just as the result of isBlocked suggest: https://mazdermind.de/volatile/dot-files.zip

Edit I can reproduce the issue with gst1-java-core 1.1.0 and 1.2.0-SNAPSHOT from master

neilcsmith-net commented 4 years ago

Thanks for the report. I could do with a reproducible test case and comparison C code here. I'm also not sure if the assertion is harmless - see eg. https://groups.google.com/forum/#!topic/gstreamer-java/w7sbHPYYzEg

neilcsmith-net commented 4 years ago

Possibly also look at debugger stepping through https://github.com/gstreamer-java/gst1-java-core/blob/master/src/org/freedesktop/gstreamer/Pad.java#L395 Maybe something is wrong in the mask filtering / structure here?

MaZderMind commented 4 years ago

I've extracted the problematic Code into a Library: https://github.com/MaZderMind/gintercom-mixing-core It contains one Integration-Test, MixingCoreIT. When running the Test in a Loop (I'm using IntelliJ's "Repeat Until Failure" feature) after a while (On my Notebook it takes between 000 and 300 Test-Runs, total 5-10 minutes) one of the Tests get stuck in de.mazdermind.gintercom.mixingcore.Panel#releaseSrcPad by not calling the callback of teePad.block.

I'm trying to debug this and will keep this ticket updated, but the rare occasion of the problem does make in-situ debugging quite hard, so it more or less comes down to print/log-style debugging.

neilcsmith-net commented 4 years ago

OK, keep us up-to-date with how you get on. Also, try adding some System.gc() calls into key points in the test and see if that alters the frequency at which things occur.

MaZderMind commented 4 years ago

I found at least two different problems in this area, but I think there might still be a third problem. But let's start with no.1:

When GSTPAD_API.gst_pad_add_probe is called with a mask that of IDLE which usually is what we want when calling block(), it can, if the pad is already IDLE directly call the probe-callback from the calling thread.

In the case of block() this will directly jump into blocks eventReceived method, which will directly call removeCallback.

removeCallback tries to find the callback-object in the Pad's signals but fails to do so, because we actually have not yet created the actual callback object - we're still stuck in the constructor-call of the GCallback object.

Actually the GCallback is registered only after the actual callback has been executed for the first time, and it will be called again and only on the second invocation it will actually be removed. Often, especially when unlinking the pad in the block-callback, there might not be a second invocation and the probe might be dangling forever.

The solution to this is to return PadProbeReturn.REMOVE from the block-callback. This will cause GSTPAD_API.gst_pad_add_probe to return a probe-id of 0, which can be used in addEventProbe to detect that the requested probe has already been handled and there is no need for it to be registered on the GObject. I implemented this optimization in https://github.com/MaZderMind/gst1-java-core/commit/70ee7d66e329b9fe0605db90ac4e56832f95f624

MaZderMind commented 4 years ago

On a side-note the Critical assertation comes from this line in gstpad.c which is harmless, becaus it just returns null when the event is neither an Up- nor a Downstream-Event. This occurs here specifically for IDLE-Events. Moving the check into the Caller fixes the Problem. I'll open a PR with all the found fixes once I actually found them all ;)

MaZderMind commented 4 years ago

After fixing this I found the following section in the logs:

[main] INFO de.mazdermind.gintercom.mixingcore.Panel - blocking for releaseSinkPad GhostPad: [ghostpad48]
[main] INFO org.freedesktop.gstreamer.Pad - addEventProbe(mask=3)
[main] INFO org.freedesktop.gstreamer.Pad - GSTPAD_API.gst_pad_add_probe()
[main] INFO org.freedesktop.gstreamer.Pad - GstPadAPI.PadProbeCallback.callback(padProbeType=1)
[main] INFO org.freedesktop.gstreamer.Pad - EVENT_PROBE.eventReceived() -> callback.run()
[main] INFO de.mazdermind.gintercom.mixingcore.Panel - blocked for releaseSinkPad GhostPad: [ghostpad48]

[GstBus] INFO org.freedesktop.gstreamer.Pad - GstPadAPI.PadProbeCallback.callback(padProbeType=5122)
[GstBus] INFO org.freedesktop.gstreamer.Pad - EVENT_PROBE.eventReceived() -> callback.run()
[GstBus] INFO de.mazdermind.gintercom.mixingcore.Panel - blocked for releaseSinkPad GhostPad: [ghostpad48]

The first callback occured synchronous because the pad was already IDLE (padProbeType=1). The second callback happens from the Service-Thread for an PUSH+QUERY_UPSTREAM+BLOCK Event. I think this is caused by installing the Block as IDLE or BLOCK which will trigger when the Pad has no Data flowing as well as when it has Queries. This might occur simultaneously, in which case the Callbac actually gets called twice.

I fixed this by only specifying the Probe-Type as IDLE.

Unfortunately I still see Pad-Probe-Callbacks not being called:


[main] INFO de.mazdermind.gintercom.mixingcore.Panel - blocking for releaseSrcPad GhostPad: [ghostpad1122]
[main] INFO org.freedesktop.gstreamer.Pad - addEventProbe(mask=1)
[main] INFO org.freedesktop.gstreamer.Pad - GSTPAD_API.gst_pad_add_probe()
[main] INFO org.freedesktop.gstreamer.Pad - GSTPAD_API.gst_pad_add_probe() = Probe-Id 1 on Pad native@0x7f1451deee80 (org.freedesktop.gstreamer.lowlevel.GstPadPtr@51df6d94)
[main] INFO org.freedesktop.gstreamer.Pad - addCallback(EVENT_PROBE.class)
[main] INFO de.mazdermind.gintercom.mixingcore.Panel - after blocking for releaseSrcPad GhostPad: [ghostpad1122]

the logging is not committed into my repo, so your result will probably just be a stuck test. System.gc did not seem to have any impact so far; the test-suite sometimes fails after 300 runs, sometimes after 30 and sometimes the first test already fails.

MaZderMind commented 4 years ago

I traced it down to a Problem where the Callback is registered correctly with GStreamer and GStreamer tries to call it but the call actually never makes it to the Java-World:

Look for Address 0x7fe89a1ca2e8 which is the %p of the callback as received on the C side. I do not know yet how to validate what happens to the Code at that memory location in java or how to find out which object does belong there. Any tips?

[main] INFO de.mazdermind.gintercom.mixingcore.Group - blocking for releaseSrcPad GhostPad: [ghostpad1479]
[main] INFO org.freedesktop.gstreamer.Pad - addEventProbe(mask=1)
[main] INFO org.freedesktop.gstreamer.Pad - GSTPAD_API.gst_pad_add_probe(native@0x7fe8d4ea73c0 (org.freedesktop.gstreamer.lowlevel.GstPadPtr@d4eaf3a8), 1, org.freedesktop.gstreamer.Pad$4@10280879)
0:02:32.700294419 18520 0x7fe8d4794a30 LOG           GST_SCHEDULING gstpad.c:1440:gst_pad_add_probe:<group-1-out:src_0> adding probe for mask 0x00000001 with callback-fn @ 0x7fe89a1ca2e8!
0:02:32.700318520 18520 0x7fe8d4794a30 LOG           GST_SCHEDULING gstpad.c:1466:gst_pad_add_probe:<group-1-out:src_0> got probe id 1
0:02:32.700329936 18520 0x7fe8d4794a30 LOG           GST_SCHEDULING gstpad.c:1472:gst_pad_add_probe:<group-1-out:src_0> added blocking probe, now 1 blocking probes
0:02:32.700340745 18520 0x7fe8d4794a30 LOG           GST_SCHEDULING gstpad.c:1485:gst_pad_add_probe:<group-1-out:src_0> pad is in use, delay idle callback
[main] INFO org.freedesktop.gstreamer.Pad - GSTPAD_API.gst_pad_add_probe() = Probe-Id 1 on Pad native@0x7fe8d4ea73c0 (org.freedesktop.gstreamer.lowlevel.GstPadPtr@d4eaf3a8)
[main] INFO org.freedesktop.gstreamer.Pad - addCallback(EVENT_PROBE.class)
[main] INFO org.freedesktop.gstreamer.Pad - forcing gc
0:02:32.702861059 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.702895404 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3567:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 matches 0x00001001
0:02:32.702911271 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3573:probe_hook_marshal:<group-1-out:src_0> hook 1 points to callback-fn 0x7fe89a1ca2e8
0:02:32.702923237 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3585:probe_hook_marshal:<group-1-out:src_0> hook 1 is now called @ 0x7fe89a1ca2e8
[main] INFO de.mazdermind.gintercom.mixingcore.Group - after blocking for releaseSrcPad GhostPad: [ghostpad1479]
0:02:32.726507125 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3588:probe_hook_marshal:<group-1-out:src_0> hook 1 was called @ 0x7fe89a1ca2e8 and did return 0
0:02:32.726547864 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726557926 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3549:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING & type: 4114
0:02:32.726564982 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001012
0:02:32.726568978 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726575639 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3555:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING
0:02:32.726582161 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001010
0:02:32.726590250 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726596737 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3567:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 matches 0x00001001
0:02:32.726602634 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3573:probe_hook_marshal:<group-1-out:src_0> hook 1 points to callback-fn 0x7fe89a1ca2e8
0:02:32.726608085 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3585:probe_hook_marshal:<group-1-out:src_0> hook 1 is now called @ 0x7fe89a1ca2e8
0:02:32.726688729 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3588:probe_hook_marshal:<group-1-out:src_0> hook 1 was called @ 0x7fe89a1ca2e8 and did return 0
0:02:32.726708791 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726717013 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3549:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING & type: 4114
0:02:32.726723395 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001012
0:02:32.726727937 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726733702 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3555:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING
0:02:32.726739582 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001010
0:02:32.726746922 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726752789 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3567:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 matches 0x00001001
0:02:32.726759943 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3573:probe_hook_marshal:<group-1-out:src_0> hook 1 points to callback-fn 0x7fe89a1ca2e8
0:02:32.726765281 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3585:probe_hook_marshal:<group-1-out:src_0> hook 1 is now called @ 0x7fe89a1ca2e8
0:02:32.726791219 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3588:probe_hook_marshal:<group-1-out:src_0> hook 1 was called @ 0x7fe89a1ca2e8 and did return 0
0:02:32.726828755 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726836226 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3549:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING & type: 4114
0:02:32.726842322 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001012
0:02:32.726846709 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3749:do_probe_callbacks:<group-1-out:src_0> do probes
0:02:32.726852346 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3555:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 is not one of the blocking types: GST_PAD_PROBE_TYPE_BLOCKING
0:02:32.726858257 18520 0x7fe8300068f0 LOG           GST_SCHEDULING gstpad.c:3636:probe_hook_marshal:<group-1-out:src_0> hook 1 with flags 0x000036f1 does not match 00001010
0:02:32.727790559 18520 0x7fe7c8001720 LOG           GST_SCHEDULING gstpad.c:1563:gst_pad_remove_probe:<group-1-out:src_0> removing hook 1
MaZderMind commented 4 years ago

That looks for all the world like the Probe-Object is GC'ed. Adding some logging to the finalizer shows, that the problem does happen just when a GC-Run cleans up objects of similar age. The JNA-Docs state that in Case of a GC-Cleanup, a Method Stub will be called from the C-Side (which would explain the 0 return). As far as I can see, the reference should be held in GObject.callbackListeners and yet I fail to see where the reference might get lost.

neilcsmith-net commented 4 years ago

@MaZderMind something going awry with either GC or reference counting was my initial first thought with the problem you have - I've dealt with enough of them! :smile: The best thing is probably to get a heap dump (eg. in VisualVM). Possibly while stepping through in a debugger to be sure when it's taken? Then find the reference and use the ability to look for its GC root. If we're sure that it's correctly being cached in callbackListeners then it might be an object further up being GC'd.

Maybe also try getting the ref count on the pads around the time you trigger the probe.

MaZderMind commented 4 years ago

So it seems that in fact the Pad itsself is GC'ed prematurely, because adding it to a static List<> seems to prevent the above mentioned issue (as is adding the GstPadAPI.PadProbeCallback to a static List<>). So it seems that the Pad is GC'ed and therefore it's callbacks are.

But it seems that there is another, parallel issue, where GStreamer does not call the Hook at all. I can debug this inside the C code but for the moment it is better to fix one problem at the time.

MaZderMind commented 4 years ago

Checking GstElement's pad-Methods this totally makes sense. I was under the impression that they keep a reference to their Pad-Objects, but they don't (which makes perfect sense, but leads to a Problem here).

For example Element.getRequestPad creates a new Pad-Instance for each call, that is only held by the Caller. If the Caller invokes .block on this specific Pad-Instance, the Callback is created and registered on the Pad (resp. in GObjects Callback-Map) but this object graph is fully closed, only the Pad has a Reference to itself (not counting the Method-Pointer held by GStreamer), so it can completely be GC'ed by Java.

This can be circumvented with storing a reference to the Pad wthin the Pad.Handle-Class, which takes care of storing and cleaning up the Probe-Objects anyway. I implemented this in https://github.com/MaZderMind/gst1-java-core/commit/edf907f7ad53bcf9f67e0058fb279a6133341a72 and it fixed the aforementioned Problem. According to VisualVM there is also no long-time Memory-Leak. Actually my Tests seem to leak a little bit of memory but I don't have an indication that this change caused it. Edit: The Code leaks ConcurrentHashMap-Nodes, which are not part of my Code (it uses a regular HashMap)

Let me check if this fixes all reliability issues or if there is still something left to do. I'll open a PR soon.

neilcsmith-net commented 4 years ago

This can be circumvented with storing a reference to the Pad wthin the Pad.Handle-Class

@MaZderMind no, it definitely cannot! Doing so will create a circular reference that will cause a memory leak. We need to figure out the best way the API can help to hold the reference for long enough here, assuming it's not held by a reference inside GStreamer itself / we're managing reference counts correctly.

neilcsmith-net commented 4 years ago

@MaZderMind OK, so looking at the documentation for gst_element_get_request_pad I wonder if really you should be keeping hold of the reference yourself, as it's required in gst_element_release_request_pad. However, that also makes me wonder whether keeping a List of pads in Element also makes sense - cache in list on request, remove from list on release? May also make sense for Pads added and removed with addPad() / removePad().

MaZderMind commented 4 years ago

This can be circumvented with storing a reference to the Pad wthin the Pad.Handle-Class

no, it definitely cannot! Doing so will create a circular reference that will cause a memory leak.

I don't think that's true the way I implemented it in https://github.com/MaZderMind/gst1-java-core/commit/edf907f7ad53bcf9f67e0058fb279a6133341a72

As you can see, when a Probe is registered with GStreamer, the Pad is stored with the Probe-ID. When the Probe is deregisteted, the reference to the Pad is also released. This is also true if the Pad is disposed, because then Pad.Handle.dispose() should be triggered which releases all references.

OK, so looking at the documentation for gst_element_get_request_pad I wonder if really you should be keeping hold of the reference yourself, as it's required in gst_element_release_request_pad. I do through the GhostPad. The requested Tee-Pad is internal to the Bin and from the Outside only the GhostPad is returned. It is stored by the Caller: https://github.com/MaZderMind/gintercom-mixing-core/blob/master/src/main/java/de/mazdermind/gintercom/mixingcore/Group.java#L83-L91

Also a Setup-and-Forget use of .block() should be possible, when the Caller does not plan to release the requested pad in the future; it should not be the callers responsibility to prevent Objects passed to the lower Level from being GCed.

However, that also makes me wonder whether keeping a List of pads in Element also makes sense - cache in list on request, remove from list on release? May also make sense for Pads added and removed with addPad() / removePad().

Yes, this would probably a natural way to fix the GC-Problem. This should happen though based on GStreamers native pointer, so that getRequestPad("src_%u") and a later Call to getStaticPad("src_0") do return the same Java-Instance.

neilcsmith-net commented 4 years ago

As you can see, when a Probe is registered with GStreamer, the Pad is stored with the Probe-ID. When the Probe is deregisteted, the reference to the Pad is also released. This is also true if the Pad is disposed, because then Pad.Handle.dispose() should be triggered which releases all references.

Sorry, I hadn't actually looked in depth at the code you proposed. I am still very wary of it, particularly if you see a leak of ConcurrentHashMap nodes only when it's enabled - that would imply the problem I suggested. OTOH, if you're seeing them either way, please find out what they are and open another issue on that.

Also a Setup-and-Forget use of .block() should be possible, when the Caller does not plan to release the requested pad in the future; it should not be the callers responsibility to prevent Objects passed to the lower Level from being GCed.

Maybe! I'd like to see how this is handled when done from the C side to be sure. We don't always protect callers from GC - eg. a collected Pipeline will stop working.

Yes, this would probably a natural way to fix the GC-Problem. This should happen though based on GStreamers native pointer, so that getRequestPad("src_%u") and a later Call to getStaticPad("src_0") do return the same Java-Instance.

This should already happen for every native object - see https://github.com/gstreamer-java/gst1-java-core/blob/master/src/org/freedesktop/gstreamer/glib/NativeObject.java#L210

MaZderMind commented 4 years ago

if you're seeing them either way, please find out what they are and open another issue on that. I have not yet validated if this is originated in this change.

I still have another issue, where GStreamer does not even try to call back into the Java-Side, but it is very rare. I had to run the Test-Suite in a Loop for 3 hours to trigger it once. I'm trying to add more debugging in the C code to analyze this further.

MaZderMind commented 4 years ago

So based on this simple Gist I was able to trace the remainig Problem to a Deadlock within the block-Call, where the Main and one of the GstBus-Threads seem to deadlock upon calling synchronized methods within Pad or Pad.Handle. I'm not yet sure how the respective calling paths are and which action results in the deadlock, but it is very quickly (wihin a couple of seconds) reproducible with the 1.1.0 release and my changed.

[main] INFO Main - > blocking
[main] INFO Main - < getting
[GstBus] INFO Main - = blocked
[GstBus] INFO Main - = completed
[main] INFO Main - < got
[main] INFO Main - > blocking
[main] INFO Main - = blocked
[main] INFO Main - = completed

Thread Info:

"main@1" prio=5 tid=0x1 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
     blocks GstBus@1407
     waiting for GstBus@1407 to release lock on <0x5b9> (a org.freedesktop.gstreamer.Pad)
      at org.freedesktop.gstreamer.glib.GObject.removeCallback(GObject.java:412)
      at org.freedesktop.gstreamer.Pad.access$100(Pad.java:87)
      at org.freedesktop.gstreamer.Pad$1.eventReceived(Pad.java:330)
      at org.freedesktop.gstreamer.Pad$4.callback(Pad.java:447)
      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source:-1)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at com.sun.jna.CallbackReference$DefaultCallbackProxy.invokeCallback(CallbackReference.java:520)
      at com.sun.jna.CallbackReference$DefaultCallbackProxy.callback(CallbackReference.java:551)
      at com.sun.jna.Native.invokeLong(Native.java:-1)
      at com.sun.jna.Function.invoke(Function.java:428)
      at com.sun.jna.Function.invoke(Function.java:361)
      at com.sun.jna.Library$Handler.invoke(Library.java:265)
      at com.sun.proxy.$Proxy13.gst_pad_add_probe(Unknown Source:-1)
      at org.freedesktop.gstreamer.Pad$Handle.addProbe(Pad.java:735)
      - locked <0x5bb> (a org.freedesktop.gstreamer.Pad$Handle)
      at org.freedesktop.gstreamer.Pad$Handle.access$200(Pad.java:715)
      at org.freedesktop.gstreamer.Pad.addEventProbe(Pad.java:455)
      at org.freedesktop.gstreamer.Pad.block(Pad.java:327)
      at Main.main(Main.java:31)
"GstBus@1407" daemon prio=5 tid=0x12 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
     blocks main@1
     waiting for main@1 to release lock on <0x5bb> (a org.freedesktop.gstreamer.Pad$Handle)
      at org.freedesktop.gstreamer.Pad$Handle.removeProbe(Pad.java:744)
      at org.freedesktop.gstreamer.Pad$Handle.access$400(Pad.java:715)
      at org.freedesktop.gstreamer.Pad$5.disconnect(Pad.java:465)
      at org.freedesktop.gstreamer.glib.GObject$GCallback.remove(GObject.java:596)
      at org.freedesktop.gstreamer.glib.GObject.removeCallback(GObject.java:417)
      - locked <0x5b9> (a org.freedesktop.gstreamer.Pad)
      at org.freedesktop.gstreamer.Pad.access$100(Pad.java:87)
      at org.freedesktop.gstreamer.Pad$1.eventReceived(Pad.java:330)
      at org.freedesktop.gstreamer.Pad$4.callback(Pad.java:447)
      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source:-1)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at com.sun.jna.CallbackReference$DefaultCallbackProxy.invokeCallback(CallbackReference.java:520)
      at com.sun.jna.CallbackReference$DefaultCallbackProxy.callback(CallbackReference.java:551)
MaZderMind commented 4 years ago

I was able to fix the deadlock synchronizing upon event-probe creation (not only on removal) in https://github.com/gstreamer-java/gst1-java-core/commit/96df8aaa46fc6fbfc03aff3c6defebe743e25d86. On data-probe creation a synchronization point was already present.

I'm quite sure that this fixes all reliability issue around Pad.block(). I don't see any Memory-Leak either when just spinning around Pad.block, if there is a Memory Leak it must be somewhere else. I'll try to to reproduce it in a minimal Case and open a Ticket with HeapDumps for further Analysis.

In the meantime I'll open a PR with the changes I made so we can discuss them.

MaZderMind commented 4 years ago

All the PRs are merged, this issue is therefore resolved from my point of view.