j-haacker / cryoswath

Swath process CryoSat-2 to glacier elevation change maps and geodetic mass loss time series for mountain glaciers and ice caps.
MIT License
4 stars 1 forks source link

signals from OS go unnoticed #23

Open j-haacker opened 5 months ago

j-haacker commented 5 months ago

Describe the bug When used on a scheduling system, signals to terminate because of the time limit go unnoticed. This may be a broader issue. Not sure which signals are received (keyboard interrupt works).

To Reproduce Difficult to reproduce. May depend on OS and scheduler. Run l3.build_dataset() for a region for which the l1b data still needs to be processed (this caused the latest uncaught termination) on a scheduling system imposing a deadline too early to finish the l1b to l2 processing (maybe just let it run for 10 min).

Expected behavior Any function should exit gracefully, if it has the chance. Stopping all writing processes, removing partially processed files, etc..

Traceback

getting 2022-08-15 09:44:06                                                                                                                                                                                                                                                               
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.                                                                                                                                                       
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")                                                                                                                                                                                               
getting 2022-08-16 20:22:18                                                                                                                                                                                                                                                               
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.                                                                                                                                                       
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")                                                                                                                                                                                               
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                                
  result_data = func(*input_data)                                                                                                                                                                                                                                                         
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.                                                                                                                                                       
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")                                                                                                                                                                                               
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                                
  result_data = func(*input_data)                                                                                                                                                                                                                                                         
srun: Force Terminated job 3805231                                                                                                                                                                                                                                                        
srun: Job step aborted: Waiting up to 122 seconds for job step to finish.                                                                                                                                                                                                                 
slurmstepd: error: *** STEP 3805231.0 ON cmp034 CANCELLED AT 2024-06-05T18:03:46 DUE TO TIME LIMIT ***                                                                                                                                                                                    
saving 2022-08-08 10:40:33                                                                                                                                                                                                                                                                
getting 2022-08-17 09:42:13                                                                                                                                                                                                                                                               
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                                
  result_data = func(*input_data)                                                                                                                                                                                                                                                         
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.                                                                                                                                                       
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")                                                                                                                                                                                               
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                                
  result_data = func(*input_data)                                                                                                                                                                                                                                                         
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                                
  result_data = func(*input_data)                                                                                                                                                                                                                                                         
saving 2022-08-09 09:50:36     

[excluding some 50 irrelevant lines of similar info and warnings as above and below]

getting 2022-08-25 19:24:35
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                               
  result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-12 20:26:00
getting 2022-08-26 08:44:09
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-16 20:22:18
getting 2022-08-26 08:44:48
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                               
  result_data = func(*input_data)
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
saving 2022-08-18 20:20:19
getting 2022-08-26 20:12:43
saving 2022-08-22 20:16:56
getting 2022-08-27 09:32:53
/scratch/jmhaacker/2023__cryoswath/scripts/../cryoswath/misc.py:602: UserWarning: Dropping 584 glaciers < 1 km² from RGI o1 region.
  warnings.warn(f"Dropping {sum(small_glacier_mask)} glaciers < 1 km² from RGI o1 region.")
/home/jmhaacker/miniconda3/envs/cryoswath/lib/python3.12/site-packages/xarray/core/computation.py:822: RuntimeWarning: divide by zero encountered in log10                                                                                                                               
  result_data = func(*input_data)
srun: error: cmp034: task 0: Killed
srun: launch/slurm: _step_signal: Terminating StepId=3805231.0

Environment cryoswath at 4a7a70c 16 CPUs/processes region 07-01 Svalbard