tweag / sparkle

Haskell on Apache Spark.
BSD 3-Clause "New" or "Revised" License
447 stars 30 forks source link

`UnsatisfiedLinkError` for `invokeMain` #149

Closed bwbaugh closed 5 years ago

bwbaugh commented 5 years ago

Trying to run the hello app / sparkle-example-hello, the symptom is the following error message:

Exception in thread "main" java.lang.UnsatisfiedLinkError: io.tweag.sparkle.SparkMain.invokeMain([Ljava/lang/String;)V
        at io.tweag.sparkle.SparkMain.invokeMain(Native Method)
        at io.tweag.sparkle.SparkMain.main(SparkMain.java:11)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

After many hours of debugging (as I am pretty unfamiliar with this library, JNI, gcc and linkers), finally found the root cause was ld stripping out all of the shared libraries from the dummy shared library, libhsapp.so.

The biggest clue was seeing only one NEEDED shared library in the output of:

$ jar xf sparkle-example-hello.jar sparkle-app.zip
$ unzip sparkle-app.zip
$ readelf -d libhsapp.so

The code that generates the dummy/stub library (implemented for #138) is: https://github.com/tweag/sparkle/blob/6086b54b1a0240d53c20fac5e61d7f88eec1aeca/Sparkle.hs#L66-L68

The fix is to add --no-as-needed as a linker option. Example diff:

diff --git a/Sparkle.hs b/Sparkle.hs
index 14fd345..5dee25b 100644
--- a/Sparkle.hs
+++ b/Sparkle.hs
@@ -64,7 +64,7 @@ makeHsTopLibrary hsapp libs = withSystemTempDirectory "libhsapp" $ \d -> do
     -- relative path. "-L d -l:hsapp" doesn't work in centos 6 where the
     -- path to hsapp in the output library ends up being absolute.
     callProcessCwd d "gcc" $
-      [ "-shared", "-Wl,-z,origin", "-Wl,-rpath=$ORIGIN", "hsapp"
+      [ "-shared", "-Wl,-z,origin", "-Wl,-rpath=$ORIGIN", "-Wl,--no-as-needed", "hsapp"
       , "-o", f] ++ libs
     LBS.fromStrict <$> BS.readFile f

In case it matters, here are the versions of the things that I think might be relevant:

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.26.1
Copyright (C) 2015 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
facundominguez commented 5 years ago

Hello @bwbaugh. We recently noticed that in ubuntu the library libhsapp.so was missing the invokeMain symbol, but hadn't spent the time to figure out what was amiss with the linker. Thanks for sharing it!

Is there any missing piece of documentation we could write that would have helped your experience?

bwbaugh commented 5 years ago

Thank you for the fast merge! 😄

Now that this ticket exists, hopefully it’s searchable for someone encountering the same issue.

For other potential documentation, thanks for asking! I suppose covering this kind of case would be pretty difficult. Ideally these sorts of implementation details would never need to be made visible to Haskell users. Having primers on JNI, gcc, and ld also seem pretty far out of scope. 😅

Probably nothing really missing for now in the documentation. A couple of potential ideas that may help in general:

All of the above is much easier said than done. Whatever types of docs that new users would ask about as they ramp up on the library—either as users or contributors—and start needing to debug their jobs.


Now that this bug is fixed (🤗) and Hello World is working, I’m ready to start trying to write my own job. I’m actually trying to port a mrjob (that runs a Haskell executable) that was running on EMR to use Spark instead.

I’m really interested in using sparkle in an existing repo rather than trying to go the other way around. Definitely will be keeping a close eye on https://github.com/tweag/sparkle/issues/146.

In addition to the example apps, perhaps some more examples, closer to production uses if possible? For example, processing JSON line files? Maybe some best practices? I’ll be watching https://github.com/tweag/sparkle/issues/148 too as it’s pretty close the job that I’d like to port to Spark.

Then again, maybe these beginner questions are more related to how to use Spark in general as opposed to how to use this library.

mboes commented 5 years ago

Thanks for the feedback @bwbaugh. Regarding documentation, we have a couple of blog posts in the pipes, which we never got around to actually writing. One in particular on PIE and jarification tricks we use. They end up being missing peaces in the documentation corpus and we need to fix that.

Regarding #146 and friends, we've tried spinning out the jarification process from sparkle several times (see https://github.com/tweag/jarify), but ultimately I think the future lies in using a different build process that more deeply understands the interaction between two different languages. We're nearly done with the work to add Bazel support to inline-java (see https://github.com/tweag/inline-java/pull/120 from a few days ago) and we'll be doing the same with sparkle. Where inline-java will have both Cabal and Bazel support moving forward, sparkle may well only fully work in a Bazel-style build. No promises yet, but I'm reasonably confident that Bazel will actually allow us to get rid of the sparkle command altogether, as well as the logic it implements.

bwbaugh commented 5 years ago

Looking forward to those blog posts, if there is time to write them. 😄

Hadn’t heard of Bazel before, but it looks pretty cool! 😎 I was a bit worried having another software requirement would make it harder to adopt sparkle, but it looks like this tool is pretty popular.

Whatever makes it easiest to use sparkle in existing projects, if possible!


How close is Bazel support for sparkle? I noticed the build command in the README doesn’t work right now.

$ bazel build //apps/hello:sparkle-example-hello_deploy.jar
ERROR: /…/sparkle/WORKSPACE:3:1: name 'http_archive' is not defined
ERROR: /…/sparkle/WORKSPACE:9:1: name 'http_archive' is not defined
ERROR: /…/sparkle/WORKSPACE:15:1: name 'http_archive' is not defined
ERROR: /…/sparkle/WORKSPACE:22:1: name 'http_archive' is not defined
ERROR: /…/sparkle/WORKSPACE:28:1: name 'http_archive' is not defined
ERROR: Error evaluating WORKSPACE file
ERROR: error loading package '': Encountered error while reading extension file 'nixpkgs/nixpkgs.bzl': no such package '@io_tweag_rules_nixpkgs//nixpkgs': error loading package 'external': Could not load //external package
ERROR: error loading package '': Encountered error while reading extension file 'nixpkgs/nixpkgs.bzl': no such package '@io_tweag_rules_nixpkgs//nixpkgs': error loading package 'external': Could not load //external package
INFO: Elapsed time: 0.060s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

Got a little further with the following:

diff --git a/WORKSPACE b/WORKSPACE
index 9c05730..f33259c 100644
--- a/WORKSPACE
+++ b/WORKSPACE
@@ -1,5 +1,7 @@
 workspace(name = "io_tweag_sparkle")

+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+
 http_archive(
   name = "io_tweag_rules_haskell",
   strip_prefix = "rules_haskell-730d42c225f008a13e48bf5e9c13010174324b8c",

but still too many errors for me to look at right now:

global variable '_haskell_common_attrs' is referenced before assignment.
…
ERROR: Skipping '//apps/hello:sparkle-example-hello_deploy.jar': error loading package 'apps/hello': Extension file 'haskell/haskell.bzl' has errors

I guess this should probably be a separate issue. 😅

$ bazel version
Build label: 0.23.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Mar 4 10:37:56 2019 (1551695876)
Build timestamp: 1551695876
Build timestamp as int: 1551695876
bwbaugh commented 5 years ago

PS: Does a new release for this package need to be made? Current is v0.7.4. Or will this be picked up whenever the next release is?

mboes commented 5 years ago

Bazel support currently in-repo targets an old Bazel so you'll need to use whatever version of Bazel is exposed by nix-shell (which is old by now). Newer rules_haskell versions don't support older Bazel versions, because Bazel is pre-1.0 and frequently makes breaking changes at this point. That doesn't prevent a great deal many people from using it, from Stripe to Uber to SpaceX (and of course Google, who have used it internally for 12 years).

What is in production with a few of our clients today is the current Cabal + sparkle command based workflow. So I recommend sticking with that for now. But just to let you know that moving to Bazel is on the roadmap.