Hello example fails with "failed to map segment from shared object" error

alexvorobiev commented 6 years ago

I am using Nix on RHEL7.

stack build hello --nix
stack exec --nix -- sparkle package sparkle-example-hello
stack exec --nix -- spark-submit --master 'local[1]'  sparkle-example-hello.jar

Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/sparkle-app-7748289005650184200/hsapp: /tmp/sparkle-app-7748289005650184200/hsapp: failed to map segment from shared object
    at java.lang.ClassLoader$NativeLibrary.load(Native Method)
    at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
    at java.lang.Runtime.load0(Runtime.java:809)
    at java.lang.System.load(System.java:1086)
    at io.tweag.sparkle.SparkleBase.loadApplication(SparkleBase.java:61)
    at io.tweag.sparkle.SparkleBase.<clinit>(SparkleBase.java:28)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:712)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

mboes commented 6 years ago

Strange error. Never seen it before. What does the output of

$ nix-shell -p nix-info --run "nix-info -m"

say?

alexvorobiev commented 6 years ago

 - system: `"x86_64-linux"`
 - host os: `Linux 3.10.0-514.26.2.el7.x86_64, Red Hat Enterprise Linux Server, 7.3 (Maipo)`
 - multi-user?: `no`
 - sandbox: `no`
 - version: `nix-env (Nix) 1.11.16`
 - channels(user1): `"nixpkgs-18.03pre125021.73a01aa028c"`
 - channels(nixadmin): `"nixos-18.03pre124015.f59a0f7f1a6, nixpkgs-18.03pre124071.310ad4345bb"`
 - channels(user2): `"nixpkgs-18.03pre117690.ab2cc75f78, nixos-18.03pre117327.3fe7cddc30"`
 - nixpkgs: `/u/avorobiev/.nix-defexpr/channels_nixadmin/nixos`

mboes commented 6 years ago

@facundominguez any ideas?

facundominguez commented 6 years ago

Not really. Googling for the error shows that sometimes it is caused by too strict permissions. Perhaps tmp is mounted with noexec. But it is a shot in the dark.

alexvorobiev commented 6 years ago

/tmp is mounted with noexec!

alexvorobiev commented 6 years ago

Adding --driver-java-options="-Djava.io.tmpdir=..." got me a little further:

18/01/11 16:35:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: failure to login
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
...

mboes commented 6 years ago

That's an unrelated problem. If you simplify the hello example to instead process a local file (rather than one on AWS S3), then you should be able to workaround that. I wonder why you're getting login failures though. Maybe Amazon recently invalidated the access key. @facundominguez can you reproduce?

Re the noexec thing - @facundominguez we should add this to the FAQ. Then we can close this ticket.

alexvorobiev commented 6 years ago

@mboes I have just changed the code to use a local file and still got this exception. Here is more output:

Exception in thread "main" java.io.IOException: failure to login
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:824)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
    at io.tweag.inlinejava.Inline__sparkle072126eViymYvMiEGHk2u3Idxs_Control_Distributed_Spark_Context.inline__method_2(Inline__sparkle072126eViymYvMiEGHk2u3Idxs_Control_Distributed_Spark_Context.java:13)
    at io.tweag.sparkle.SparkMain.invokeMain(Native Method)
    at io.tweag.sparkle.SparkMain.main(SparkMain.java:11)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It looks like it happens before it gets to the line where it needs to read the file.

mboes commented 6 years ago

Yes, when loggin into to AWS. See the lines

confSet conf "spark.hadoop.fs.s3n.awsAccessKeyId" "AKIAIKSKH5DRWT5OPMSA"
confSet conf "spark.hadoop.fs.s3n.awsSecretAccessKey" "bmTL4A9MubJSV9Xhamhi5asFVllhb8y10MqhtVDD"

Remove them.

facundominguez commented 6 years ago

I just tried the hello example and it works for me. It doesn't look like a problem with revoked AWS credentials then.

facundominguez commented 6 years ago

Moreover, if I remount /tmp with noexec, I can reproduce this error. And if I use @alexvorobiev tip to use --driver-java-options="-Djava.io.tmpdir=...", it works again.

alexvorobiev commented 6 years ago

I finally figured out what the problem was. It is our use of active directory for authentication (no local unix users). RHEL uses sssd to communicate with the LDAP/AD server but libnss_sss.so.2 is not packaged by nix so we have to use the one from the host os via LD_LIBRARY_PATH. It looks like when called from under stack spark-submit does not get that environment variable so it is unable to figure out my user name. Calling spark-submit directly (without stack exec --nix --) solved the issue and the hello example now works for me.

facundominguez commented 6 years ago

I explained the error in the troubleshooting section of the readme in master.

tweag / sparkle

Hello example fails with "failed to map segment from shared object" error #136