pytroll / pytroll-collectors

Collector modules for Pytroll
GNU General Public License v3.0
3 stars 18 forks source link

gatherer stops gathering if uncaught exception raised in pyorbital #79

Open gerritholl opened 3 years ago

gerritholl commented 3 years ago

When an uncaught exception is raised in pyorbital, for example, due to https://github.com/pytroll/pyorbital/issues/74, gatherer stops gathering. The last sign of life in my gatherer logfile is:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/trigger.py", line 397, in run                                                                                                                  
    self.process(msg)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/trigger.py", line 111, in add_file                                                                                                             
    self._do(pathname)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/trigger.py", line 107, in _do                                                                                                                  
    Trigger._do(self, mda)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/trigger.py", line 86, in _do                                                                                                                   
    res = collector(metadata.copy())
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/region_collector.py", line 65, in __call__                                                                                                     
    return self.collect(granule_metadata)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pytroll_collectors/region_collector.py", line 147, in collect                                                                                                     
    granule_pass = Pass(platform, start_time, end_time,
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollsched/satpass.py", line 176, in __init__                                                                                                                     
    self.orb = orbital.Orbital(satellite, line1=tle1, line2=tle2)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyorbital/orbital.py", line 164, in __init__                                                                                                                      
    self.tle = tlefile.read(satellite, tle_file=tle_file,
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyorbital/tlefile.py", line 106, in read                                                                                                                          
    return Tle(platform, tle_file=tle_file, line1=line1, line2=line2)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyorbital/tlefile.py", line 154, in __init__                                                                                                                      
    self._read_tle()
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyorbital/tlefile.py", line 200, in _read_tle                                                                                                                     
    urls = (max(glob.glob(os.environ["TLES"]),
ValueError: max() arg is an empty sequence

Despite the exception, the daemon is still running, as shown by supervisorctl:

$ supervisorctl -c supervisord.conf status
pytroll-polar:pytroll-aapp-runner                     RUNNING   pid 19658, uptime 11 days, 7:59:32
pytroll-polar:pytroll-cat                             RUNNING   pid 14107, uptime 28 days, 1:45:12
pytroll-polar:pytroll-gatherer-metopa                 RUNNING   pid 14080, uptime 24 days, 6:16:04
pytroll-polar:pytroll-gatherer-metopb                 RUNNING   pid 14229, uptime 24 days, 6:16:00
pytroll-polar:pytroll-gatherer-metopc                 RUNNING   pid 14353, uptime 24 days, 6:15:57
pytroll-polar:pytroll-nameserver                      RUNNING   pid 14096, uptime 28 days, 1:45:12
pytroll-polar:pytroll-trollflow2                      RUNNING   pid 398, uptime 15 days, 0:35:45
pytroll-polar:pytroll-trollstalker-metopa-direkt      RUNNING   pid 26102, uptime 28 days, 0:58:37
pytroll-polar:pytroll-trollstalker-metopa-eumetcast   RUNNING   pid 26265, uptime 28 days, 0:58:28
pytroll-polar:pytroll-trollstalker-metopb-direkt      RUNNING   pid 26111, uptime 28 days, 0:58:34
pytroll-polar:pytroll-trollstalker-metopb-eumetcast   RUNNING   pid 26451, uptime 28 days, 0:58:25
pytroll-polar:pytroll-trollstalker-metopc-direkt      RUNNING   pid 26223, uptime 28 days, 0:58:31
pytroll-polar:pytroll-trollstalker-metopc-eumetcast   RUNNING   pid 26479, uptime 28 days, 0:58:22
pytroll-polar:pytroll-trollstalker-noaa               RUNNING   pid 14103, uptime 28 days, 1:45:12

This is the worst of both worlds: It continues running, so it's not restarted by supervisorctl, but it's not doing anything anymore, so production has stopped.

The gatherer should catch and log exceptions raised downstream (such as by pyorbital), then try to resume gathering if at all possible.