pytroll / pytroll-collectors

Collector modules for Pytroll
GNU General Public License v3.0
3 stars 18 forks source link

geographic gatherer stopped after IndexError exception #100

Open TAlonglong opened 2 years ago

TAlonglong commented 2 years ago

geographic gatherer stopped working, but did not exit with a value supervisor recognized to trigger a restart. This cause supervisord not to restart the gatherer. Supervisor reported:

2022-03-21 21:38:51,889 INFO exited: gatherer-winds (exit status 0; expected)

The log from geographic gatherer

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/software/miniconda/envs/pytroll/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/software/miniconda/envs/pytroll/lib/python3.9/site-packages/pytroll_collectors/triggers/_base.py", line 227, in run
    self.publish_collection(next_timeout[0].finish_without_reset())
  File "/software/miniconda/envs/pytroll/lib/python3.9/site-packages/pytroll_collectors/triggers/_base.py", line 91, in publish_collection
    subject = self._get_topic(metadata[0])
IndexError: list index out of range
[ERROR: 2022-03-21 21:38:40 : pytroll_collectors.geographic_gatherer] Something went wrong
Traceback (most recent call last):
  File "/software/miniconda/envs/pytroll/lib/python3.9/site-packages/pytroll_collectors/geographic_gatherer.py", line 201, in run
    raise RuntimeError
RuntimeError
[INFO: 2022-03-21 21:38:40 : pytroll_collectors.geographic_gatherer] Ending publication the gathering of granules...
[INFO: 2022-03-21 21:38:40 : gatherer] GeographicGatherer has stopped.
Setting timezone to UTC

I guess the metadata does not have an element, but I don't understand why

https://github.com/pytroll/pytroll-collectors/blob/d85bcf98455f1effde259debea9f790e7fa25ba2/pytroll_collectors/triggers/_base.py#L89-L92

pnuu commented 2 years ago

What's the incoming message triggering this? And the configuration. I have a feeling I've seen something like this before.

TAlonglong commented 2 years ago

config

[DEFAULT]
regions = eurol

[ears-metop-ascat]
regions=ears_high_res
pattern = ascat_{start_time:%Y%m%d_%H%M%S}_{platform_name:s}_{orbit_number:s}_ear_o_coa_{ears_station:s}
format = EPS
type = binary
sensor = ascat
timeliness = 30
level = 0
variant = EARS
publish_topic = /CAT/EARS/METOP-ASCAT/BUFR
service=move_it_metop-ascat-ears-coa
topics=/metop/ascat/ears/coa/l2_bufr
duration=600
publish_message_after_each_reception=True

[ears-metop-ascat-global-057]
regions=ears_high_res
pattern = ascat_{start_time:%Y%m%d_%H%M%S}_{platform_name:s}_{orbit_number:s}_eps_o_057_ovw.l2_bufr
format = EPS
type = binary
sensor = ascat
timeliness = 120
level = 0
variant = EARS
publish_topic = /CAT/GLOBAL/METOP-ASCAT-057/BUFR
service=move_it_metop-ascat-global-057
topics=/metop/ascat/global/057/l2_bufr
duration=600
publish_message_after_each_reception=True

[global-metop-amv-l2d]
regions=amv_collection
pattern = W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,{platform_name:s}+AVHRR_C_EUMP_{start_time:%Y%m%d%H%M%S}_{orbit_number:5s}_eps_o_amv_l2d.bin
format=BUFR
type=binary
data_processing_level=2
sensor=avhrr
#timeliness in minutes
timeliness=120
duration=180
variant=GLOBAL
publish_topic=/GLOBAL/METOP/AMV/2d
service=move_it_global-metop-amv-l2d
topics=/CHECK/GLOBAL/SEGMENTS/METOP/AMV/2d
publish_message_after_each_reception=True
TAlonglong commented 2 years ago

The last message printed before the error:

[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Waiting 4583.930227 seconds until timeout
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] mda: {'sensor': 'avhrr', 'platform_name': 'Metop-B', 'start_time': datetime.datetime(2022, 3, 21, 20, 52, 3), 'orbit_number': '49329', 'uri': 'ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin', 'uid': 'W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin', 'origin': '157.249.16.182:9062', 'end_time': datetime.datetime(2022, 3, 21, 20, 55, 3)}
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Adding area ID amv_collection to metadata for Metop-B
[INFO: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Added expected granule Metop-B (2022-03-21 20:52:03) to area amv_collection
[INFO: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Collection finished for Metop-B area amv_collection
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Composing topic.
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Is last file added: False
[INFO: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] sending pytroll://GLOBAL/METOP/AMV/2d collection ubuntu@satellite-wind-processing-b 2022-03-21T21:38:39.079863 v1.01 application/json {"sensor": "avhrr", "platform_name": "Metop-B", "start_time": "2022-03-21T20:13:03", "orbit_number": "49329", "origin": "157.249.16.188:9062", "end_time": "2022-03-21T20:55:03", "collection_area_id": "amv_collection", "collection": [{"start_time": "2022-03-21T20:13:03", "end_time": "2022-03-21T20:16:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201303_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201303_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:16:03", "end_time": "2022-03-21T20:19:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201603_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201603_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:19:03", "end_time": "2022-03-21T20:22:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201903_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321201903_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:22:03", "end_time": "2022-03-21T20:25:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202203_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202203_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:25:03", "end_time": "2022-03-21T20:28:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202503_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202503_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:28:03", "end_time": "2022-03-21T20:31:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202803_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321202803_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:31:03", "end_time": "2022-03-21T20:34:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203103_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203103_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:34:03", "end_time": "2022-03-21T20:37:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203403_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203403_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:37:03", "end_time": "2022-03-21T20:40:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203703_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321203703_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:40:03", "end_time": "2022-03-21T20:43:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204003_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204003_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:43:03", "end_time": "2022-03-21T20:46:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204303_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204303_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:46:03", "end_time": "2022-03-21T20:49:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204603_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204603_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:49:03", "end_time": "2022-03-21T20:52:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204903_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321204903_49329_eps_o_amv_l2d.bin"}, {"start_time": "2022-03-21T20:52:03", "end_time": "2022-03-21T20:55:03", "uri": "ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin", "uid": "W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin"}]}

It's a bit tricky to follow the log. I'm not sure if this is a mix of two gatherer configs. But it looks like it gets a file, add granule. This completes the gathering, but at the same time it says the last segment is not added. and then sends the message. And then the error comes.

pnuu commented 2 years ago

If the log messages are hard to follow, try to follow the code :sweat_smile:

In short, I have, at least yet, no good ideas why pytroll_collectors.region_collector.RegionCollector.granules is an empty list when it's being accessed on the publishing step.

pnuu commented 2 years ago

I think the above incoming message isn't what triggered the crash. The completed collection were published correctly (the area and file naming match). I'd think the crashing one is the previous incoming message, from which the Waiting for ... and Is last file added log messages are from.

TAlonglong commented 2 years ago

OK, so here is the log with the message just before that. So it looks like this is not included and then we might have a race condition still handling of one segment when the next one comes in and is added, triggers complete and sends the message. All this when the previous message is still under handeling. Then the message is cleaned and when it tries to do some handeling then it is already removed?

Just a guess

[DEBUG: 2022-03-21 21:38:36 : pytroll_collectors.triggers._base] mda: {'sensor': 'avhrr', 'platform_name': 'Metop-B', 'start_time': datetime.datetime(2022, 3, 21, 20, 55, 3), 'orbit_number': '49329', 'uri': 'ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205503_49329_eps_o_amv_l2d.bin', 'uid': 'W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205503_49329_eps_o_amv_l2d.bin', 'origin': '157.249.16.188:9062', 'end_time': datetime.datetime(2022, 3, 21, 20, 58, 3)}
[DEBUG: 2022-03-21 21:38:36 : pytroll_collectors.region_collector] Adding area ID amv_collection to metadata for Metop-B
[DEBUG: 2022-03-21 21:38:36 : pytroll_collectors.region_collector] Platform name Metop-B and sensor avhrr: Start and end times = 20220321 20:55:03 20220321 20:58:03
[DEBUG: 2022-03-21 21:38:36 : trollsched.satpass] kwargs: {'instrument': 'avhrr'}
[DEBUG: 2022-03-21 21:38:36 : trollsched.satpass] instrument: avhrr
[DEBUG: 2022-03-21 21:38:36 : pyorbital.tlefile] Reading TLE from /data/pytroll/tle/tle_db/tle-latest-pytroll.txt
[DEBUG: 2022-03-21 21:38:36 : trollsched.boundary] Instrument = avhrr
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Waiting 4583.930227 seconds until timeout
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] mda: {'sensor': 'avhrr', 'platform_name': 'Metop-B', 'start_time': datetime.datetime(2022, 3, 21, 20, 52, 3), 'orbit_number': '49329', 'uri': 'ssh://satellite-wind-processing-b/data/pytroll/global-metop-amv-l2d/W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin', 'uid': 'W_XX-EUMETSAT-Darmstadt,SOUNDING+SATELLITE,METOPB+AVHRR_C_EUMP_20220321205203_49329_eps_o_amv_l2d.bin', 'origin': '157.249.16.182:9062', 'end_time': datetime.datetime(2022, 3, 21, 20, 55, 3)}
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Adding area ID amv_collection to metadata for Metop-B
[INFO: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Added expected granule Metop-B (2022-03-21 20:52:03) to area amv_collection
[INFO: 2022-03-21 21:38:39 : pytroll_collectors.region_collector] Collection finished for Metop-B area amv_collection
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Composing topic.
[DEBUG: 2022-03-21 21:38:39 : pytroll_collectors.triggers._base] Is last file added: False
pnuu commented 2 years ago

Could very well be. Were there any on-going collections for the other configured sections? The Waiting 4583.930227 seconds until timeout seems a bit odd being terminated so far ahead of time, but I guess that is becouse timeliness in the config has been set to 120 minutes, which is more than one full orbit. How large is the collection area?

TAlonglong commented 2 years ago

Hm, looks very big. I don't remember why this is so huge. Centerd around the north pole as far as can understanmd from the config

amv_collection:
  description: Norway - Area for amv collection
  projection:
    proj: stere
    lat_0: 90
    lon_0: 0
    lat_ts: 60
    ellps: WGS84
  shape:
    height: 3776
    width: 3776
  area_extent:
    lower_left_xy: [-7552173.63, -7552173.63]
    upper_right_xy: [7552173.63, 7552173.63]
pnuu commented 2 years ago

That is indeed a huge area:

from pyproj import Proj

prj = Proj({"proj": "stere", "lat_0": 90, "lon_0": 0, "lat_ts": 60, "ellps": "WGS84"})
print(prj(-7552173.63, -7552173.63, inverse=True))

shows

(-45.0, 6.044472281577602)

So the area goes 6° north of Equator covering most of the northern hemisphere. The description is a bit optimistic for the Norwegian territory :sweat_smile:

gerritholl commented 2 years ago

Related?

https://github.com/pytroll/pytroll-collectors/issues/79 https://github.com/pytroll/pyresample/issues/329

TAlonglong commented 2 years ago

79 no, thats just the glob of the TLE environment variable that is empty. The other I have no idea.

gerritholl commented 2 years ago

Both the TLE environment variable being empty and #329 lead to the geographic gatherer stopping, in case of #329 due to a strange area.

gerritholl commented 2 years ago

Ah, no, never mind. That was trollflow2, not the gatherer, which silently stops in case of 329.