Open robhudson opened 7 years ago
See #520 for details on how the EMR logs are different from the spark job logs being created in the batch script.
In the meeting today we decided it would be best to update the batch.sh file to create the log files with a more deterministic name that we can use on the Python side. One idea was the job name + cluster job ID, if possible.
@maurodoglio Would the above cause any problems that you know of? Is it possible to get the cluster job ID into the batch script?
Would the above cause any problems that you know of?
I don't think so
Is it possible to get the cluster job ID into the batch script?
I think so, you can probably use the aws cli and filter the list of running jobs by some attributes accessible from the machine (maybe the hostname?)
Here's an example of pulling out the jobflow ID from the running cluster: https://gist.github.com/robotblake/7b08526b7a411739cd4c344476dd0860
This could be inserted into the job flow steps prior to the batch.sh to pass the jobflow_id.
This is to be able to more easily associate logs with a specific run