Describe the bug
When used on a scheduling system, signals to terminate because of the time limit go unnoticed. This may be a broader issue. Not sure which signals are received (keyboard interrupt works).
To Reproduce
Difficult to reproduce. May depend on OS and scheduler. Run l3.build_dataset() for a region for which the l1b data still needs to be processed (this caused the latest uncaught termination) on a scheduling system imposing a deadline too early to finish the l1b to l2 processing (maybe just let it run for 10 min).
Expected behavior
Any function should exit gracefully, if it has the chance. Stopping all writing processes, removing partially processed files, etc..
Traceback
getting 2022-08-15 09:44:06
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
getting 2022-08-16 20:22:18
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
srun: Force Terminated job 3805231
srun: Job step aborted: Waiting up to 122 seconds for job step to finish.
slurmstepd: error: *** STEP 3805231.0 ON cmp034 CANCELLED AT 2024-06-05T18:03:46 DUE TO TIME LIMIT ***
saving 2022-08-08 10:40:33
getting 2022-08-17 09:42:13
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
saving 2022-08-09 09:50:36
[excluding some 50 irrelevant lines of similar info and warnings as above and below]
getting 2022-08-25 19:24:35
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-12 20:26:00
getting 2022-08-26 08:44:09
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-16 20:22:18
getting 2022-08-26 08:44:48
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-18 20:20:19
getting 2022-08-26 20:12:43
saving 2022-08-22 20:16:56
getting 2022-08-27 09:32:53
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10
result_data = func(*input_data)
srun: error: cmp034: task 0: Killed
srun: launch/slurm: _step_signal: Terminating StepId=3805231.0
Environment
cryoswath at 4a7a70c
16 CPUs/processes
region 07-01 Svalbard
Describe the bug When used on a scheduling system, signals to terminate because of the time limit go unnoticed. This may be a broader issue. Not sure which signals are received (keyboard interrupt works).
To Reproduce Difficult to reproduce. May depend on OS and scheduler. Run
l3.build_dataset()
for a region for which the l1b data still needs to be processed (this caused the latest uncaught termination) on a scheduling system imposing a deadline too early to finish the l1b to l2 processing (maybe just let it run for 10 min).Expected behavior Any function should exit gracefully, if it has the chance. Stopping all writing processes, removing partially processed files, etc..
Traceback
[excluding some 50 irrelevant lines of similar info and warnings as above and below]
Environment cryoswath at 4a7a70c 16 CPUs/processes region 07-01 Svalbard