grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
967 stars 52 forks source link

run -local mode doesn't kill all spawned processes when canceled #94

Open olgabot opened 5 years ago

olgabot commented 5 years ago

Hello, I killed a reflow run -local job with Ctrl-C and a bunch of the processes are still running so I have to manually kill them. Is there a way to get Reflow to look for its spawned processes? Warmest, Olga

mariusae commented 5 years ago

Since -local still uses Docker, the spawned processes aren't children of Reflow and thus are not killed by the operating system when Reflow is killed. One interesting thing: if you kill Reflow like this, and then restart it, you'll find that it'll just attach itself to already running Docker executions, so that they do not need to be restarted.

We have discussed changing -local mode to use a reflowlet (Reflow server) as well: so when you run reflow -local, it would look for an existing reflow server instance or start one anew. In this case, when you kill the original reflow the reflowlet would notice that it's gone and then dispose of the running tasks.

cc: @swami-m since this is also interesting for another change he's been working on

olgabot commented 5 years ago

Ah I see, that explains some of the issues I was having! I'd Ctrl-C the reflow process, git pull for the latest workflow and then get the exact same error, and only by running on a fresh EC2 instance could I get the configurations to refresh.

mariusae commented 5 years ago

you can also clear the local state by clearing /tmp/flow or using a different local dir: reflow run -local -localdir=/tmp/xxx ...

mariusae commented 5 years ago

This is by design currently, but we are working on changing the way local mode works so that this will not happen.