pytroll / pytroll-collectors

Collector modules for Pytroll
GNU General Public License v3.0
3 stars 19 forks source link

segment_gatherer publishes incorrect end_time when using group_by_minutes #112

Closed gerritholl closed 2 years ago

gerritholl commented 2 years ago

When using group_by_minutes, the segment gathere publishes the incorrect end time. It publishes the end_time for the first segment in the slot, rather than the end_time for the segment in the slot latest in time (max of all end times).

In the following MCVE, the segment gatherer is configured with group_by_minutes: 10 and receives three segments. The first segment covers 13:00:00-13:01:00, the second 13:01:00-13:02:00, and the third 13:02:00-13:03:00. They all fit in the slot starting at 13:00:00. The correct start and end time for the slot would be 13:00:00-13:03:00, but the segment gatherer publishes a message with start and end times 13:00:00-13:01:00 instead.

MCVE:

import datetime
from pytroll_collectors.segments import SegmentGatherer
from posttroll.message import Message

sg = SegmentGatherer({
    "patterns": {
        "oak": {
            "pattern": "oak-s{start_time:%Y%m%d%H%M%S}-e{end_time:%Y%m%d%H%M%S}-s{segment}.tree",
            "critical_files": None,
            "wanted_files": ":001-003",
            "all_files": ":001-003",
            "is_critical_set": False,
            "variable_tags": ["start_time", "end_time"]}},
    "timeliness": 10,
    "group_by_minutes": 10,
    "time_name": "start_time"})

messages = [Message(
    rawstr=f"pytroll://tree/oak file pytroll@forest 1980-01-01T13:0{i:d}:00.000000 v1.01 application/json "
           f'{{"platform_name": "forest", "start_time": "1980-01-01T13:0{i:d}:00", "end_time": '
           f'"1980-01-01T13:0{i+1:d}:00", "uri": "/data/oak-s19800101130{i:d}00-e19800101130{i+1:d}00-'
           f's00{i:d}.tree", "uid": "oak-s19800101130{i:d}00-e19800101130{i+1:d}00-s00{i:d}.tree", '
           '"sensor": "Thaumetopoea processionea"}')
        for i in range(3)]
for msg in messages:
    sg.process(msg)
print(sg.slots["1980-01-01 13:00:00"].output_metadata["start_time"])
print(sg.slots["1980-01-01 13:00:00"].output_metadata["end_time"])

Expected output:

1980-01-01 13:00:00
1980-01-01 13:03:00

Actual output:

1980-01-01 13:00:00
1980-01-01 13:01:00

using latest pytroll-collectors main.