jwharm / java-gi

GObject-Introspection bindings generator for Java
GNU Lesser General Public License v2.1
84 stars 7 forks source link

Segfault in Cleaner #111

Closed JFronny closed 2 months ago

JFronny commented 3 months ago

In my application, I have been encountering a segfault appearing seemingly at random. After looking more closely at the JVM error that is generated, this seems to be coming from an invocation of g_type_check_instance_is_fundamentally_a happening somewhere in the Cleaner-1 thread, which is why I am opening this here. Unfortunately, I have not been able to build a simple reproducer, but if you need more information, I'll try to help. Some of the VM error logs:

jwharm commented 3 months ago

Thanks for reporting this, and for the VM logs.

When a Java-GI proxy object for a native GObject instance is garbage-collected, a Cleaner is triggered to call g_object_remove_toggle_ref to free the native instance. In g_object_remove_toggle_ref there is the following check:

g_return_if_fail (G_IS_OBJECT (object));

The G_IS_OBJECT macro calls g_type_check_instance_is_fundamentally_a and segfaults.

This is a bug in Java-GI because g_object_remove_toggle_ref must run in the Gtk main context thread, but here it is called from the GC thread:

---------------  T H R E A D  ---------------

Current thread (0x000078bcbc9d6700):  JavaThread "Cleaner-1" daemon

Stack: [0x000078bc7bf00000,0x000078bc7c000000],  sp=0x000078bc7bffe508,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libgobject-2.0.so.0+0x46f91]  g_type_check_instance_is_fundamentally_a+0x11
...

This is a known issue because it also occured in Gir.Core a while ago (link) and @badcel notified me about it. I know how to fix this; it just wasn't high on my todo list until now.

As you said, this issue is hard to reproduce, so I'd appreciate if you can test the fix when it's ready (I will publish a preview release).

jwharm commented 3 months ago

@JFronny I published a 0.10.2-SNAPSHOT release that I hope will fix the issue. Can you please retest?

For snapshot dependencies, add https://s01.oss.sonatype.org/content/repositories/snapshots to the repositories in your build script.

JFronny commented 3 months ago

I will retest it next weekend

JFronny commented 3 months ago

I have retested it using 0.10.2-SNAPSHOT, but still seem to be getting segfaults.

jwharm commented 3 months ago

g_object_remove_toggle_ref still segfaults. Perhaps the address was already freed somewhere else.

Can you please run with environment variable G_MESSAGES_DEBUG=all and attach the output?

JFronny commented 3 months ago

Sure, here is the log. (I split it into two files due to the size limit)

jwharm commented 3 months ago

The segfault happens when a StringList is garbage-collected and tries to unref the native instance. The native instance is already disposed at that point, but I don't know why.

I tried to debug the issue in Inceptum. It seems to be triggered for KDropDown.kt method updateOptions, i.e. when updating an existing model.

One thing that maybe will help, is to store your StringList instances somewhere in a field that you keep alive while the "create" dialog is open. The JVM doesn't know that the StringList model is used by native code, so you have to make sure it is "kept alive", otherwise the GC sometimes destroys it too soon.

Still, that doesn't explain why the StringList instance seems to be cleaned twice...

jwharm commented 3 months ago

This should be fixed now.

The problem was that the ownership of the StringList is transferred to the DropDown constructor, which means that it eventually will drop the reference, so java-gi must increase the ref count to avoid a double free like we were seeing. This didn't work for interface instances (a ListModel in this case) due to a bug.

Furthermore, the ref count was increased after the function call (instead of before), but that's too late, so I moved the check.

I've published a new 0.10.2-SNAPSHOT version.

JFronny commented 2 months ago

I have just tried it again and can no longer reproduce this with the latest build, so it seems like the fix worked. Thanks!