tjgalvin / flint

BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Strange rename MS behavour #127

Closed tjgalvin closed 2 weeks ago

tjgalvin commented 2 weeks ago

Noticed that an EMU measurement set for beam14 failed to go through all four rounds of self-calibration. It seems that somehow the checks in copy_and_clean_ms_casagain that aim to detect whether a output MS already exists (which then automatically removes the file) picked up the original MS.

I can trace the code and do not see how this function in isolation could rename -> delete -> try to rename. I am wondering whether somehow the naming creation tool is incorrect. I did find an unsaved change to the MS name creation function that might be related.

Only other consideration is some weird race / concurrent task issue. At a loss on this. Raising this issue to try to keep my thoughts together and track it.

For some ~5000 flint runs as far as I remember I have never seen this. Confusion.

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round3.ms'), column='CORRECTED_DATA', beam=None, spw=None, field=None, model_column='MODEL_DATA')
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Updating gaincal options with: {'solint': '600s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Detected a previous round of self-calibration. 
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Output MS name will be /scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round4.ms.
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Copying /scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round3.ms to /scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round4.ms.
02:49:32 AM
task_gaincal_applycal_ms-122
flint

/scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round4.ms already exists. Removing it. 
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Removing folder /scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round4.ms
02:49:32 AM
task_gaincal_applycal_ms-122
flint

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
    return gaincal_applycal_ms(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
    cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
    ms = rename_ms_and_columns_for_selfcal(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
    raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/emu_data/51818/proc_2/51818/SB51818.EMU_2006-55.beam14.round3.ms does not exists or is not a directory (hence measurement set).
tjgalvin commented 2 weeks ago

Actually happened again in a RACS field. This was using the --rename-ms option on the CLI. So very likely it is related to that and the problem is somewhere in the rename_ms_and_columns_for_selfcal function and the steps in preparation for it.

Still unclear what. For this field this was the only error across all beams and rounds of self-calibration. It also happened on the last round of self-cal. Maybe related to the zip archive option? Though this was turned off for the EMU data so probably not.

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms'), column='CORRECTED_DATA', beam=22, spw=None, field=None, model_column='MODEL_DATA')
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Detected a previous round of self-calibration. 
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms.
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Copying /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms.
02:44:43 AM
task_gaincal_applycal_ms-123
flint

/scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms already exists. Removing it. 
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Removing folder /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round4.ms
02:44:43 AM
task_gaincal_applycal_ms-123
flint

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
    return gaincal_applycal_ms(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
    cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
    ms = rename_ms_and_columns_for_selfcal(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
    raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms does not exists or is not a directory (hence measurement set).
02:44:43 AM
task_gaincal_applycal_ms-123
prefect.task_runs
Finished in state Failed('Task run encountered an exception FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57475/SB57475.RACS_1336-09.beam22.round3.ms does not exists or is not a directory (hence measurement set).')
tjgalvin commented 2 weeks ago

And again for a separate field, again on the last round of self-calibration

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms'), column='CORRECTED_DATA', beam=29, spw=None, field=None, model_column='MODEL_DATA')
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Detected a previous round of self-calibration. 
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms.
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Copying /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms.
02:45:21 AM
task_gaincal_applycal_ms-130
flint

/scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms already exists. Removing it. 
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Removing folder /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round4.ms
02:45:21 AM
task_gaincal_applycal_ms-130
flint

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
    return gaincal_applycal_ms(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
    cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
    ms = rename_ms_and_columns_for_selfcal(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
    raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57778/SB57778.RACS_1419-09.beam29.round3.ms does not exists or is not a directory (hence measurement set).
tjgalvin commented 2 weeks ago

Even stranger. It looks like it did go through the self-calibration, and then tried to do the self-calibration again?

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms'), column='CORRECTED_DATA', beam=6, spw=None, field=None, model_column='MODEL_DATA')
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Detected a previous round of self-calibration. 
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Copying /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Renaming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to target=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms')
06:21:37 AM
task_gaincal_applycal_ms-140
flint

Removing DATA and renaming CORRECTED_DATA
06:21:38 AM
task_gaincal_applycal_ms-140
flint

Using nominated DATA column for /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:21:38 AM
task_gaincal_applycal_ms-140
flint

Flagging NaNs and zeros in DATA.
06:21:38 AM
task_gaincal_applycal_ms-140
flint

Will flag 183652566094 NaNs, zero'd data 167040, zero'd UVW 9507. 
06:21:39 AM
task_gaincal_applycal_ms-140
flint

Flagging based on extreme Stokes-V, threshold dxy_thresh=4.0
06:21:39 AM
task_gaincal_applycal_ms-140
flint

Will flag 5389158444 extreme Stokes-V.
06:21:39 AM
task_gaincal_applycal_ms-140
flint

Flags before: 14016388, Flags after: 14022212, Difference 5824
06:21:39 AM
task_gaincal_applycal_ms-140
flint

Adding updated flags column
06:21:39 AM
task_gaincal_applycal_ms-140
flint

Will create calibration table /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.caltable.
06:21:40 AM
task_gaincal_applycal_ms-140
flint

Transforming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms to have 4 SPWs
06:21:40 AM
task_gaincal_applycal_ms-140
flint

Successfully created the transformed measurement set /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms_transform with 4 SPWs.
06:22:16 AM
task_gaincal_applycal_ms-140
flint

Removing folder /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:22:16 AM
task_gaincal_applycal_ms-140
flint

Renaming /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms_transform to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:22:16 AM
task_gaincal_applycal_ms-140
flint

Solutions have been solved. Applying them. 
06:22:43 AM
task_gaincal_applycal_ms-140
flint

Will attempt to merge all spws using cvel.
06:22:57 AM
task_gaincal_applycal_ms-140
flint

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms'), column='CORRECTED_DATA', beam=6, spw=None, field=None, model_column='MODEL_DATA')
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Updating gaincal options with: {'solint': '480s', 'calmode': 'ap', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 4}
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Detected a previous round of self-calibration. 
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Copying /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms to /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms.
06:26:38 AM
task_gaincal_applycal_ms-140
flint

/scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms already exists. Removing it. 
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Removing folder /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round4.ms
06:26:38 AM
task_gaincal_applycal_ms-140
flint

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
    return gaincal_applycal_ms(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
    cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
    ms = rename_ms_and_columns_for_selfcal(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
    raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57912/SB57912.RACS_1419-04.beam06.round3.ms does not exists or is not a directory (hence measurement set).
tjgalvin commented 2 weeks ago

And again. And unlike others this happened on the third self-cal round.

Measurement set to be self-calibrated: ms=MS(path=PosixPath('/scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms'), column='CORRECTED_DATA', beam=1, spw=None, field=None, model_column='MODEL_DATA')
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Updating gaincal options with: {'solint': '30s', 'calmode': 'p', 'round': 0, 'minsnr': 0.0, 'uvrange': '>235m', 'selectdata': True, 'gaintype': 'G', 'nspw': 1}
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Detected a previous round of self-calibration. 
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Output MS name will be /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms.
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Copying /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms to /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms.
02:44:53 AM
task_gaincal_applycal_ms-37
flint

/scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms already exists. Removing it. 
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Removing folder /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round2.ms
02:44:53 AM
task_gaincal_applycal_ms-37
flint

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/engine.py", line 2099, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/miniforge3/envs/flint_main/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/prefect/common/imaging.py", line 248, in task_gaincal_applycal_ms
    return gaincal_applycal_ms(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 256, in gaincal_applycal_ms
    cal_ms = copy_and_clean_ms_casagain(ms=ms, round=round, rename_ms=rename_ms)
  File "/scratch3/gal16b/packages/flint_main/flint/flint/selfcal/casa.py", line 88, in copy_and_clean_ms_casagain
    ms = rename_ms_and_columns_for_selfcal(
  File "/scratch3/gal16b/packages/flint_main/flint/flint/ms.py", line 795, in rename_ms_and_columns_for_selfcal
    raise FileNotFoundError(
FileNotFoundError: /scratch3/gal16b/racs_low3_proc/proc/57175/SB57175.RACS_1315-04.beam01.round1.ms does not exists or is not a directory (hence measurement set).
tjgalvin commented 2 weeks ago

This is looking like a file system issue and not a flint issue.

tjgalvin commented 2 weeks ago

Reasonably certain that this was from lustre issues. Closing and it is not flint in origin.