Closed lolaclinton closed 7 years ago
I'm guessing this has to do with s3 credentials. Is there a way to save them outside the code or configuration? So if someone else uses my code it will use their credentials for the EMR job? Also, I can't find the log files for this cluster: j-37KJST2B19MM3. Which bucket should it be in? So I redirected the log files as you show in your documentation but there is no info there ...
For the credentials, you can use sparkInstanceRole := ...
and make sure the role it pointed to (default is EMR_EC2_DefaultRole
) has the permission on reading your S3 bucket.
For the logging, follow the setting at https://github.com/pishen/sbt-emr-spark#to-set-the-s3-logging-folder-for-emr-cluster And after the log is written to the S3 location, you may find your Spark logs at a sub-folder similar to:
j-xxxxxxxxxxxx/containers/application_xxxxxxxxxxxxx_0001/container_xxxxxxxxxxxxx_0001_01_000001/stderr.gz
Thanks for the advice. It seems like the cluster terminates the moment it crashes. So while I can access the log I can't access the errors on the cluster itself. Is there a setting I can set to do that? I tried starting the cluster in advance but the behavior was the same, termination.
withKeepJobFlowAliveWhenNoSteps
is true
when you create the cluster in advance
https://github.com/pishen/sbt-emr-spark/blob/master/src/main/scala/EmrSparkPlugin.scala#L137
withActionOnFailure
is ActionOnFailure.CONTINUE
by default
https://github.com/pishen/sbt-emr-spark/blob/master/src/main/scala/EmrSparkPlugin.scala#L269
I'm not sure how can it still terminate automatically with these settings.
If you just do nothing and throw a RuntimeException
from your job to cause the job failed, will it terminate the cluster automatically as well?
Well it seems to stop now. I manually added withActionOnFailure(true). Connecting to the cluster didn't help much though. I'm seeing a Hadoop exit code 15, which is very hard to decipher. Do you have any ideas how to get more information? On stackoverflow I'm seeing people recommending peeking at the Yarn logs. Do they exist in this situation? Thanks :(
So managed to get there by logging into the master :) Seems the bug wasn't mine - had to do with this: https://github.com/aws/aws-sdk-java/issues/1094 FYI, would be wonderful if your system had a way to pull this data out easily. Glad everything is working though :)
@lolaclinton I'm not sure how did you see the error log of IllegalAccessError
? If you have a clear instruction on how to get the log, maybe we can figure out how to get it programmatically.
@lolaclinton The issue seems to be fixed by EMR 5.8.0? If you still meet a problem, feel free to tell me :)
Sorry to keep asking questions .. I tried to run my job and it crashed. I'm not sure how to debug it with this error report