lofar-astron / factor

Facet calibration for LOFAR
http://www.astron.nl/citt/facet-doc
GNU General Public License v2.0
19 stars 12 forks source link

stopping Factor during facetimage #202

Open AHorneffer opened 7 years ago

AHorneffer commented 7 years ago

When Factor is running the facetimage pipelines it cannot easily be stopped. Just pressing <CTRL-C> on the command-line will only stop the current facet - which will "fail" - but then runfactor will continue with the next facet and so on.

Even if one wants to continue imaging the next facet if one facet fails, there should be a way top stop Factor when doing facetimage pipelines.

mhardcastle commented 7 years ago

ctrl-Z, kill %1 ? :)

AHorneffer commented 7 years ago

Is that different from <CTRL-C>?

mhardcastle commented 7 years ago

I think ctrl-z will stop the parent process, and kill %1 will kill it.

mhardcastle commented 7 years ago

(obviously mutatis mutandis if you have other suspended processes)

AHorneffer commented 7 years ago

Well, I just noticed that my Factor runs probably only stopped during selfcal because I set exit_on_selfcal_failure = True

General question: is it actually desirable to continue with the processing if one of the pipelines fails? ("fails" as in: a program in there crashes.)

soumyajitmandal commented 7 years ago

Probably I am facing similar issue (?). Apparently my job is stuck at the facetimage step (step 5 of 5). I went to the node and used: pkill -u mandal

Still running checkfactor is showing 'processing' that facet.

AHorneffer commented 7 years ago

Did you also kill the original runfactor processes? (The first one and its child.) If so, then what do checkfactor just didn't get the notice that the pipeline processing that facet isn't running anymore. Anyhow, if you kill all the processes (and maybe check if they are indeed killed) then Factor isn't running anymore. Which also means that it should be safe (well as safe as it's going to be) to re-start Factor.