Closed chuwy closed 2 years ago
I think we had the same issue with SQL Runner - the solution was to template and store the steps in memory at launch rather than getting them at time. This might be simpler than altering the logic for launching a new cluster potentially?
Both options look good for me, I don't have strong preferences.
Can you submit steps to a cluster before it's started?
I thought EmrEtlRunner does it?
E.g. I see steps even on clusters that were failed during validation.
Got it, then it feels like best is:
When we're running playbook with
run-transient
, dataflow-runner first starts a cluster and only when cluster is running submits steps from playbook. This can lead to race conditions when config files synced/deployed after dataflow-runner started, but before cluster started.Also, this will prevent failures where playbook refers to other file (
base64File
for example) that is not available. Right now it starts a cluster, sees that file is unavailable and terminates cluster, whereas it could give an error without launching cluster.