wasmerio / wasmer-java

☕ WebAssembly runtime for Java
https://medium.com/wasmer/announcing-the-first-java-library-to-run-webassembly-wasmer-jni-89e319d2ac7c
MIT License
593 stars 55 forks source link

Segfaulting the JVM #58

Open helins opened 3 years ago

helins commented 3 years ago

Describe the bug

I am currently developing WASM tooling and I am using wasmer-java interactively from Clojure. Sometimes, after a while, the JVM suddenly segaults because of Wasmer:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe1df8c4f91, pid=27960, tid=27980
#
# JRE version: OpenJDK Runtime Environment (Zulu11.39+15-CA) (11.0.7+10) (build 11.0.7+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (11.0.7+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [wasmer_jni2309101336629026624.lib+0x5ef91]  _$LT$hashbrown..raw..RawTable$LT$T$GT$$u20$as$u20$core..ops..drop..Drop$GT$::drop::hfb03333fcdd0f7eb+0x31
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/adam/projects/clj/helins/wasmeta/core.27960)
#
# An error report file with more information is saved as:
# /home/adam/projects/clj/helins/wasmeta/hs_err_pid27960.log
[thread 28012 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
rlwrap: warning: clojure crashed, killed by SIGABRT (core dumped).
rlwrap itself has not crashed, but for transparency,
it will now kill itself with the same signal

And here is the full report: https://gist.github.com/helins-io/ffbb4e46eaf5b4adbfc960952ac577e9

Steps to reproduce

In an interactive session (using a Clojure REPL), I often load a WASM file, create an instance, execute a function and close that instance right away. At first nothing happens. However, after a while, I get this SIGSEV.

Additional context

"After a while" propably means when GC kicks in. I guess this could be a double free error where finalizing the instance object tries to free pointer(s) which were already freed manually when closing the instance myself. Sounds plausible after skimming at the report.

Hywan commented 3 years ago

Thank you for the detailed report! Indeed, it looks like a double-free. That's curious. Can you provide a minimal working example so that I can try to reproduce please?

helins commented 3 years ago

It's part of a bigger library I am currently writing. It's still messy, I guess I could maybe invite you on my private repo? Are you comfortable at all with Clojure?

But anyways, there really isn't much to it. The parts that leverage Wasmer are a very direct wrapper over the Java API. Essentially, it's exactly the same as using the Java API straight away.

Maybe the problem is due to this debug-like behavior: creating an instance, calling a function right way, and closing it right away. Could this fast cycle be problematic? In a real application you would probably hold on the instance for a bit, that's maybe why no one had this issue before.

helins commented 3 years ago

I believe the best thing to do is to simply remove .finalize. I remember reading that it is discouraged in newer Java versions as it is unreliable, especially regarding native resources (eg. you never know when the finalization happens). Either the user should release native resources explicitly (as in .close) or there are better ways than .finalize for automatically managing those resources. I am not really familiar with those but you might want to checkout phantom references.

Hywan commented 3 years ago

I agree with you that Close() must called manually rather than relying on the GC. I will try to reproduce by myself :-). Thanks!