Closed BenFradet closed 8 years ago
Looks good to me.
Have you tested this both against a recent Spark commit, like 99b7187c2dce4c73829b9b32de80b02a053763cc
, as well as an older Spark commit from before the move of make-distribution.sh
, like f19228eed89cf8e22a07a7ef7f37a5f6f8a3d455
?
I tested the script itself but not as a part of flintrock, should I?
While trying to test my change, I'm getting:
paramiko.ssh_exception.SSHException: not a valid EC private key file
despite having a properly formatted .pem file.
Do you have any idea what could be causing this?
Hmm, I've never seen that error before. It seems to be ultimately coming from EC2?
Are you able to use that same private key file to log into EC2 instances outside of Flintrock?
Found my problem: the user was misconfigured.
I tested the change against today's commit: apache/spark@4eace4d384f0e12b4934019d8654b5e3886ddaef and the latest in the 1.6 branch: apache/spark@db4795a7eb1bac039e9e96237cf77e47ed76dde8
The build is correctly started. However, the spark core project won't compile (it might be because I'm using t2.micro instances).
Yeah, to build Spark in a reasonable amount of time you'd need at least
m3.xlarge
instances.
Thanks for contributing this patch and testing it out! I'll merge this in.
Hmm, actually I'm having trouble getting this to work against the latest commit of Spark. I get this error:
<snipped>
+ VERSION='[ERROR] Re-run Maven using the -X switch to enable full debug logging.'
Do you get the same error? This may be a subtle change on Spark's side that we have to handle.
Trying on m3.xlarge
I get the same error as you which I didn't get on t2.micro
, weird.
I'll keep investigating and keep you posted.
after cding into the dev
dir before calling make-distribution.sh
, I get the following when trying to compile:
[info] Error occurred during initialization of VM [info] java.lang.Error: Properties init: Could not determine current working directory. [info] at java.lang.System.initProperties(Native Method) [info] at java.lang.System.initializeSystemClass(System.java:1166)
Apparently the parallel build option (-T 1C
) is causing it to fail.
The first maven instruction, which is: /tmp/spark/build/mvn help:evaluate -X -Dexpression=project.version -T 1C -Phadoop-2.6
, fails with:
[ERROR] java.util.concurrent.ExecutionException: java.lang.NullPointerException java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.multiThreadedProjectTaskSegmentBuild(MultiThreadedBuilder.java:170) at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.build(MultiThreadedBuilder.java:91) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288) at org.apache.maven.cli.MavenCli.main(MavenCli.java:199) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder.multiThreadedProjectTaskSegmentBuild(MultiThreadedBuilder.java:166) ... 16 more Caused by: java.lang.NullPointerException at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:185) at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:181) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Also, there is a warning regarding parallel execution which might be causing the failure:
[WARNING] [WARNING] * Your build is requesting parallel execution, but project [WARNING] * contains the following plugin(s) that have goals not marked [WARNING] * as @threadSafe to support parallel building. [WARNING] * While this /may/ work fine, please look for plugin updates [WARNING] * and/or request plugins be made thread-safe. [WARNING] * If reporting an issue, report it against the plugin in [WARNING] * question, not against maven-core [WARNING] * [WARNING] The following goals are not marked @threadSafe in Spark Project Parent POM: [WARNING] org.apache.maven.plugins:maven-help-plugin:2.2:evaluate [WARNING] *****
Are you ok with removing it?
Btw, I tested the script as a part of flintrock with the two previously mentioned commits and it worked in both cases (having removed -T 1C
from the 2.0 script).
I think something else is going on.
If I clone Spark locally and run
./dev/make-distribution.sh -T 1C -Phadoop-2.6
it works fine against the latest commit. This smells like something related to the shell environment over SSH.
Interestingly, it seems that the commit that moved make-distribution.sh
(0eea12a3d956b54bbbd73d21b296868852a04494
) is not responsible for the problem we are seeing, since I was just able to launch a cluster at that commit.
I think a good next step would be to try to find the exact Spark commit that breaks this. I'll poke around more myself later this week to try to find it.
Sorry this turned into more than a simple change @BenFradet!
I'd really like to keep the -T 1C
working since people building Spark during cluster launches will really benefit from the shorter build times. It can be the difference between a 30 minute build and a 10 minute or even shorter build, depending on how many cores your cluster instances have.
For me apache/spark@4eace4d fails to build both locally and remotely with ./dev/make-distribution.sh -T 1C -Phadoop-2.6
I'll investigate later commits.
I found it. This is the commit that breaks -T 1C
: https://github.com/apache/spark/commit/6ca990fb366cf68cd9d5afb433725d28f07e51a0
Source PR: https://github.com/apache/spark/pull/11178
Mmh interesting
Revisiting the error message you posted above @BenFradet, it looks like some project changes are interfering with the parallel build option, as you pointed out. :disappointed: That PR I linked to is probably just where this change was introduced.
So I now agree with your earlier suggestion: The simplest thing to do is to simply remove the -T 1C
.
ok, will do
fixes #91