mozilla / telemetry-analysis-service

Telemetry Analysis Service
https://analysis.telemetry.mozilla.org/
Mozilla Public License 2.0
35 stars 29 forks source link

Record log URI in spark job runs #477

Open robhudson opened 7 years ago

robhudson commented 7 years ago

This is to be able to more easily associate logs with a specific run

robhudson commented 7 years ago

See #520 for details on how the EMR logs are different from the spark job logs being created in the batch script.

In the meeting today we decided it would be best to update the batch.sh file to create the log files with a more deterministic name that we can use on the Python side. One idea was the job name + cluster job ID, if possible.

@maurodoglio Would the above cause any problems that you know of? Is it possible to get the cluster job ID into the batch script?

maurodoglio commented 7 years ago

Would the above cause any problems that you know of?

I don't think so

Is it possible to get the cluster job ID into the batch script?

I think so, you can probably use the aws cli and filter the list of running jobs by some attributes accessible from the machine (maybe the hostname?)

robhudson commented 7 years ago

Here's an example of pulling out the jobflow ID from the running cluster: https://gist.github.com/robotblake/7b08526b7a411739cd4c344476dd0860

This could be inserted into the job flow steps prior to the batch.sh to pass the jobflow_id.