Closed tjgalvin closed 2 weeks ago
Actually happened again in a RACS field. This was using the --rename-ms
option on the CLI. So very likely it is related to that and the problem is somewhere in the rename_ms_and_columns_for_selfcal
function and the steps in preparation for it.
Still unclear what. For this field this was the only error across all beams and rounds of self-calibration. It also happened on the last round of self-cal. Maybe related to the zip archive option? Though this was turned off for the EMU data so probably not.
Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms'), column='CORRECTED_DATA', beam=22, spw=None, field=None, model_column='MODEL_DATA')
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Detected a previous round of self-calibration.
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms.
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Copying /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms.
02:44:43 AM
task_gaincal_applycal_ms-123
flint
/scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms already exists. Removing it.
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Removing folder /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms
02:44:43 AM
task_gaincal_applycal_ms-123
flint
Encountered exception during execution:
Traceback (most recent call last):
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
result = await call.aresult()
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
return await asyncio.wrap_future(self.future)
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
return gaincal_applycal_ms(
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
ms = rename_ms_and_columns_for_selfcal(
File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms does not exists or is not a directory (hence measurement set).
02:44:43 AM
task_gaincal_applycal_ms-123
prefect.task_runs
Finished in state Failed('Task run encountered an exception FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms does not exists or is not a directory (hence measurement set).')
And again for a separate field, again on the last round of self-calibration
Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms'), column='CORRECTED_DATA', beam=29, spw=None, field=None, model_column='MODEL_DATA')
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Detected a previous round of self-calibration.
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms.
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Copying /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms.
02:45:21 AM
task_gaincal_applycal_ms-130
flint
/scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms already exists. Removing it.
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Removing folder /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms
02:45:21 AM
task_gaincal_applycal_ms-130
flint
Encountered exception during execution:
Traceback (most recent call last):
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
result = await call.aresult()
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
return await asyncio.wrap_future(self.future)
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
return gaincal_applycal_ms(
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
ms = rename_ms_and_columns_for_selfcal(
File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms does not exists or is not a directory (hence measurement set).
Even stranger. It looks like it did go through the self-calibration, and then tried to do the self-calibration again?
Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms'), column='CORRECTED_DATA', beam=6, spw=None, field=None, model_column='MODEL_DATA')
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Detected a previous round of self-calibration.
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Copying /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Renaming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to target=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms')
06:21:37 AM
task_gaincal_applycal_ms-140
flint
Removing DATA and renaming CORRECTED_DATA
06:21:38 AM
task_gaincal_applycal_ms-140
flint
Using nominated DATA column for /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:21:38 AM
task_gaincal_applycal_ms-140
flint
Flagging NaNs and zeros in DATA.
06:21:38 AM
task_gaincal_applycal_ms-140
flint
Will flag 183652566094 NaNs, zero'd data 167040, zero'd UVW 9507.
06:21:39 AM
task_gaincal_applycal_ms-140
flint
Flagging based on extreme Stokes-V, threshold dxy_thresh=4.0
06:21:39 AM
task_gaincal_applycal_ms-140
flint
Will flag 5389158444 extreme Stokes-V.
06:21:39 AM
task_gaincal_applycal_ms-140
flint
Flags before: 14016388, Flags after: 14022212, Difference 5824
06:21:39 AM
task_gaincal_applycal_ms-140
flint
Adding updated flags column
06:21:39 AM
task_gaincal_applycal_ms-140
flint
Will create calibration table /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.caltable.
06:21:40 AM
task_gaincal_applycal_ms-140
flint
Transforming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms to have 4 SPWs
06:21:40 AM
task_gaincal_applycal_ms-140
flint
Successfully created the transformed measurement set /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms_transform with 4 SPWs.
06:22:16 AM
task_gaincal_applycal_ms-140
flint
Removing folder /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:22:16 AM
task_gaincal_applycal_ms-140
flint
Renaming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms_transform to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:22:16 AM
task_gaincal_applycal_ms-140
flint
Solutions have been solved. Applying them.
06:22:43 AM
task_gaincal_applycal_ms-140
flint
Will attempt to merge all spws using cvel.
06:22:57 AM
task_gaincal_applycal_ms-140
flint
Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms'), column='CORRECTED_DATA', beam=6, spw=None, field=None, model_column='MODEL_DATA')
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Detected a previous round of self-calibration.
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Copying /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:26:38 AM
task_gaincal_applycal_ms-140
flint
/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms already exists. Removing it.
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Removing folder /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:26:38 AM
task_gaincal_applycal_ms-140
flint
Encountered exception during execution:
Traceback (most recent call last):
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
result = await call.aresult()
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
return await asyncio.wrap_future(self.future)
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
return gaincal_applycal_ms(
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
ms = rename_ms_and_columns_for_selfcal(
File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms does not exists or is not a directory (hence measurement set).
And again. And unlike others this happened on the third self-cal round.
Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms'), column='CORRECTED_DATA', beam=1, spw=None, field=None, model_column='MODEL_DATA')
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Updating gaincal options with: {'solint': '30s', 'calmode': 'p', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 1}
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Detected a previous round of self-calibration.
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms.
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Copying /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms to /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms.
02:44:53 AM
task_gaincal_applycal_ms-37
flint
/scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms already exists. Removing it.
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Removing folder /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms
02:44:53 AM
task_gaincal_applycal_ms-37
flint
Encountered exception during execution:
Traceback (most recent call last):
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
result = await call.aresult()
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
return await asyncio.wrap_future(self.future)
File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
result = self.fn(*self.args, **self.kwargs)
File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
return gaincal_applycal_ms(
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
ms = rename_ms_and_columns_for_selfcal(
File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms does not exists or is not a directory (hence measurement set).
This is looking like a file system issue and not a flint issue.
Reasonably certain that this was from lustre issues. Closing and it is not flint in origin.
Noticed that an EMU measurement set for beam14 failed to go through all four rounds of self-calibration. It seems that somehow the checks in
copy_and_clean_ms_casagain
that aim to detect whether a output MS already exists (which then automatically removes the file) picked up the original MS.I can trace the code and do not see how this function in isolation could rename -> delete -> try to rename. I am wondering whether somehow the naming creation tool is incorrect. I did find an unsaved change to the MS name creation function that might be related.
Only other consideration is some weird race / concurrent task issue. At a loss on this. Raising this issue to try to keep my thoughts together and track it.
For some ~5000 flint runs as far as I remember I have never seen this. Confusion.