Closed jcaesar closed 3 years ago
Sorry for the late reply, I missed the notification. I'm going to take a look at it.
Ah, yay! I was afraid this repository was dead from the start...
It seems this problem can even be triggered before starting the Flink job (by e.g. moving the wasmer calls to the beginning of main
). Any chance it's related to the horribly outdated version of Java?
[Edit:] At least switching the image to flink:1.11.2-scala_2.11-java11
doesn't help. I guess I would really need a build with debug symbols to find out what's going on.
I finally wanted to know what's going on a bit better and made myself a debug build of libwasmer_jni.so to translate the symbols in the backtrace. They just came down to
/rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/sys/unix/mod.rs:231
/rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4//library/std/src/process.rs:1773
/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-clif-backend-0.17.0/src/signal/unix.rs:147
/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-clif-backend-0.17.0/src/signal/unix.rs:30
this just seems to be an average signal handler, so I guess there is no relevant wasmer code running at the time the segv happens. Phew, looks like whoever wants to solve this is in for some rather fun debugging.
[Edit:] Oh, I guess the next step is to do as this comment asks.
So, I tried disabling the signal handlers to get to see the "real segv". Turns out that fixed the problem... Not sure what to make of that.
Incidentally, wasmer 1.0 doesn't have the signal handlers anymore. Wonder how this plays out when upgrading versions...
@jcaesar Memory is clonable in Wasmer 1.0, so it should be doable I believe!
Oh, I would have assumed that would create a new copy of the actual memory area. Guess I haven't really understood the semantics. Nevertheless, I was able to construct a patch that makes wasmer-java use a current-ish wasmer master. The problem I report here disappears.
I'd throw a PR, but some of the gradle tests fail. Not sure if I'll work on that. A serious adaptation of wasmer 1.0.0 would have to chage the Java API anyway, I guess?
Fixed in 0.3.0. (Though testing that out was a bit annoying because 0.3.0 is somehow not properly released. empty folder? wat?)
Description
When using an instance created from a Flink job in docker, the taskmanagers (= things that execute the stream processing) die with a SEGV shortly after calling a wasm function.
Steps to reproduce
Requires a bit of setup, I've created a docker-compose based test case: https://github.com/jcaesar/wasmer-in-flink-segv You should be able to run it with
Actual behavior