lofar-astron / factor

Facet calibration for LOFAR
http://www.astron.nl/citt/facet-doc
GNU General Public License v2.0
19 stars 12 forks source link

Crash at merge_amp_parmdbs2 #222

Closed tikk3r closed 6 years ago

tikk3r commented 6 years ago

Factor crashes for me at the merge_amp_parmdbs2 (right after solve_ampphase22) step with merge_parmdbs_in_time failed. Looking in the logs, there is the following error:

Table /net/para11/data2/sweijen/LOFAR_VLBI/4C43.15_factor/tmp/facet_patch_374_5c7a0e/L427100_SB120_129_uv.dppp.ndppp_prep_target.dysco_chunk11_127A48492t_0g.merge_amp_parmdbs2 already exists (and is not a true table directory)

After the crash the temporary directory tmp is empty again, so I cannot check anything. All the other merging steps went fine up to this point. What could be going wrong?

darafferty commented 6 years ago

I wonder if, for some reason, the tmp/.../*dysco_chunk11_127A48492t_0g.merge_amp_parmdbs2 file is not deleted in the previous iteration (it should be deleted at the end of the merge_parmdbs_in_time script -- see line 86 of https://github.com/lofar-astron/factor/blob/master/factor/scripts/merge_parmdbs_in_time.py. Do you know if it went through one iteration of the solve_ampphase22 step already?

If so, and if you have write access to your Factor installation, you could try changing this line to something like: os.system('rm -rf {0}'.format(outparmdb)) and see if it helps.

tikk3r commented 6 years ago

Looking in the logs, it did go through one iteration of solve_ampphase22 already. I have write permissions to my installation, so I will try replacing that line.

darafferty commented 6 years ago

Another option that just occurred to me is that you might comment out dir_local_selfcal in your parset (assuming you specified something here, which I assume you did from the paths above -- if not, something else must be wrong). Specifying dir_local_selfcal is only useful if you have a separate, faster file system (like/dev/shm or, for us in Hamburg, a local SSD on each node) but do the main processing on a slower disk -- not sure what setup you have in Leiden, but it may be that you don't need to set this.

tikk3r commented 6 years ago

It was indeed the shutil.rmtree that cause me problems. I replaced lines 86 and 88 with the rm command and now this step works for me.