gstreamer-java / gst1-java-core

Java bindings for GStreamer 1.x
GNU Lesser General Public License v3.0
188 stars 72 forks source link

Got cached not owned native objects if their memory was freed and reused #177

Closed kezhuw closed 4 years ago

kezhuw commented 4 years ago

Suppose that we got a Structure through Message.getStructure. After owner message got disposed and freed finally, as its owned resource our structure got freed in C and its memory was reclaimed and can be used for future reusing. Bad things will happen if that structure's memory was allocated as new object and transferred to Java before its Java binding got collected by jvm.

May be we should not cache not owned native object ?

neilcsmith-net commented 4 years ago

It's an aspect of the library that bothers me a bit, but removing caching breaks even more, and the current behaviour has been in about a decade (1.0 tried hard to keep the existing behaviour). With GObject subclasses we should be OK, with MiniObject there are potentially problem areas, as with other things. I added in Handle in 1.0 and it includes a method to control caching, and we might need to consider if changes are required - like not caching structures. We might want to treat the message structure a bit more like Pipeline handles its Bus?

neilcsmith-net commented 4 years ago

@kezhuw I think you're seeing same behaviour as #171 which I've just made a PR for in #179 ?

kezhuw commented 4 years ago

I think this is a general problem and have no much concern with Message and Structure. Sample.getCaps, Sample.getBuffer, Caps.getStructure all have this problem. This is about memory of not owned native object got freed and reused in native code. Due to caching of not owned native object, fresh-new object could be mapped to dead object with same native pointer in Java with possible difference underlying type.

I think #171 is just a sub problem of this issue, hence #179 may not sufficient.

In my case, I solved this by removing not owned native object from NativeObject.INSTANCES through reflection.

neilcsmith-net commented 4 years ago

@kezhuw if you have an example where you get a Java object of the wrong type back, please provide a simple working example of the problem. That would be a bug and need fixing. I'm less sure there's a problem with mapping to new objects of the same type, although there might be some edge cases we should address. Changing the entire caching model in play is not something we can do lightly!

kezhuw commented 4 years ago

@kezhuw It takes time. I will draw a test like example later. Besides, I think when we got a RefCountedObject back we got #171, if we got no RefCountedObject object back, such as Structure, we got class cast exception in cast to RefCountedObject.Handle, that is what I encountered.

neilcsmith-net commented 4 years ago

I think the scenario will be covered by the same issue. Prior to #171 then you can get the situation where an unowned but cached object is replaced by an owned one at the same location, which might end up hitting the failing cast to RefCountedObject.Handle. With the fix it shouldn't be possible for an owned but not ref-counted object to be returned twice unless there's a function incorrectly marked as @CallerOwnsReturn.

However, it might still be good to check that cast with instanceof and log failures.

neilcsmith-net commented 4 years ago

I've updated #179 to also catch ClassCastException and log to get the stack trace. You'll need to run with LIFECYCLE logging enabled.

kezhuw commented 4 years ago

@neilcsmith-net I have add test-alike example in https://github.com/kezhuw/gst1-java-core/blob/native-memory-reuse-example/test/org/freedesktop/gstreamer/NativeMemoryReuseExampleTest.java. testNativeObjectStructureReuse raises ClassCastException, testNativeObjectCapsReuse is similar to #171.

testNativePointerStructureReuse and testNativePointerCapsReuse exposes the underlying cause: the underlying memory got freed and reused. This is easy to reproduce due to their underlying slab allocator used by g_slice_new(backed by g_slice_alloc) .

179 doesn't fix the root cause except certain cases. It will break if cached object and newly created object are of different types. It totally depends on allocator and randomness.

kezhuw commented 4 years ago

@neilcsmith-net Here are branch and build status for above file.

kezhuw commented 4 years ago

@neilcsmith-net I propose #180 to solve this.

neilcsmith-net commented 4 years ago

@kezhuw sorry, but any blanket removal of caching will not be accepted. Why is your test using lowlevel API classes? It's not an accurate test as is. Please see if you can replicate with Caps and Structure constructors.

neilcsmith-net commented 4 years ago

Closed #180 I replicated the issue in your test on master, merged #179 and it now passes (and checked it didn't reach the catch clause too!) Please confirm whether you can still replicate any issues with the code as is now in master. Thanks!

kezhuw commented 4 years ago

@neilcsmith-net I thought about and tested #179 and current code, seems that it solves this problem in different way than #180:

But I think there are may be potential issues with current code and #179:

  1. cls.isInstance(obj) allows NativeObject.objectFor to get subclass object. In extreme case, this may cause issues due to method override. Future coding may suffer from this subtlety.
  2. NativeObject.Handle.dispose may drop owned object from NativeObject.INSTANCES in following scenario:
    • Suppose that object O1 was collected by jvm but didn't got a chance to run its NativeObject.Handle.dispose.
    • Suppose that native memory of O1 was requested to create new native object O2, a mapping will be created in NativeObject.INSTANCES.
    • O1's NativeObject.Handle.dispose was run and O2's mapping will be dropped while it is still in using.

I have not write tests to verify the second issue, it should apply to both #179 and #180 but no harm if verified.

Besides this, I am aware of your insistence on caching implementation even with not owned objects, but I see the drawbacks: it would be dangerous to dispose manually if same object was shared among multiple components. Suppose following scenarios:

Personally, I think caching may enforce programming burden on complicated cases, although it does have its benefit on reduced jvm heap ?

neilcsmith-net commented 4 years ago

No. 1 is a minor potential issue I'm aware of. We should potentially run the GstTypes.registrationFor(gtype) check when an unowned handle is promoted to owned. However, this is only a potential concern for GObject subclasses, which are rarely unowned in the current code base, and there is additional logic in them to clear out native references already.

No 2. I can't see as a valid issue. Only owned objects have their native memory disposed of. If GStreamer clears and reuses native memory that is still owned by a Java object that is a bug. ie. until NativeObject.Handle.dispose is run, GStreamer cannot create a new native object at that location.

I'm insisting on keeping caching because this library has always had it, and its GStreamer 0.10 predecessor had it since before I got involved. Removing caching changes logic - JVM heap is a small part of it, getting the same object back for references is a big part of it. I'm not prepared to consider changing such a major functionality aspect in a point release.

The sharing objects across multiple consumers is an interesting case to consider. I'm generally in favour of encouraging explicit disposal, but see this might be an issue. Hard to know the best solution for you without deeper investigation of the use case. If it's an open-source project, jump on the mailing list; if it's commercial, happy to consider consultancy - email is on my profile.

neilcsmith-net commented 4 years ago
  • ownsHandle && !obj.handle.ownsReference() guards from memory reused by owned object.
  • Negative refAdjust < 0 guards from memory reused by not owned object.

By the way, those assumptions are the wrong way around. refAdjust < 0 happens when a return is through Natives.callerOwnsReturn. With a ref counted object this means GStreamer has added an extra ref. We drop the extra ref if the object is already cached, which means we already have a ref that Java owns. It protects already owned objects. It would be a bug for an owned but not ref-counted native type to get to this line of code.

What wasn't being done previously was promoting an unowned object to owned. In those cases we want to keep any additional reference, and mark it as owned. This may or may not be the same object, so this new line is protecting memory reused from an unowned object.

kezhuw commented 4 years ago

@neilcsmith-net I got your point, compatibility is a guarantee that should not be broken. I learned from this issue, and I think the fundamental cause behind this issue is that gstreamer-java try to hide ref-counting fact glib, gstreamer based. This design choice may cause problem if java bindings of native memory can't be collected promptly, as there are may large memory occupation in native backing these java bindings.

  • Negative refAdjust < 0 guards from memory reused by not owned object

I try to use word negative in this to express if refAdjust < 0 evaluates to false.

Thanks for you replies and patient.

neilcsmith-net commented 4 years ago

This design choice may cause problem if java bindings of native memory can't be collected promptly, as there are may large memory occupation in native backing these java bindings.

@kezhuw Wholeheartedly agree, but the bindings support implicit and explicit memory management for this reason. I usually recommend explicit memory management instead of relying on the GC. This has nothing to do with the caching though! If you find scenarios where something is stopping you from doing explicit releasing of native memory then bring that up on the mailing list for discussion.