pytroll / pytroll-collectors

Collector modules for Pytroll
GNU General Public License v3.0
3 stars 19 forks source link

Possible bug: A send without critical files received #155

Open k3a opened 2 weeks ago

k3a commented 2 weeks ago

I have been using segment_gatherer with ini config style and it has been working fine but sometimes I noticed it would send a dataset with just one file, especially after a restart. I have a list of critical files defined.

My config:

[msg]
pattern = H-000-{orig_platform_name:4s}__-{orig_platform_name:4s}________-{channel_name:_<9s}-{segment:_<9s}-{start_time:%Y%m%d%H%M}-{compression_suffix}
# Segments critical to production
critical_files = :PRO,:EPI,VIS006:000001-000008,VIS008:000001-000008,IR_016:000001-000008
# These segments we want to have, but it's still ok if they are missed
wanted_files = :PRO,:EPI,VIS006:000001-000008,VIS008:000001-000008,IR_016:000001-000008,HRV:000001-000024
# All possible segments
all_files = :PRO,:EPI,VIS006:000001-000008,VIS008:000001-000008,IR_016:000001-000008,IR_039:000001-000008,WV_062:000001-000008,WV_073:000001-000008,IR_087:000001-000008,IR_097:000001-000008,IR_108:000001-000008,IR_120:000001-000008,IR_134:000001-000008,HRV:000001-000024
topics = /input/meteosat
publish_topic = /dataset/meteosat
timeliness = 900
# Name of a time field in the pattern above
time_name = start_time
# Comma separated list of tag names in the pattern that vary between different segments of the same time slot
variable_tags = compression_suffix

By doing some debugging I found out that ini_to_dict conversion code sets is_critical_set to False:

https://github.com/pytroll/pytroll-collectors/blob/acc2631f894fc2361f1ca4a83166fedcb2db5b10/pytroll_collectors/segments.py#L973

Is that really correct? In YAML it is possible to mark a set as critical or not and I can eventually migrate my config to it. But shouldn't all legacy ini sets be critical by default? 🤔

mraspaud commented 1 week ago

@k3a thanks for reporting this issue! @pnuu do you have an opinon on this?

pnuu commented 1 week ago

I think I saw this in the past, too. Currently I only have the YAML variants in use, so have forgotten about it.

I think the is_critical_set shouldn't be set to False or True if the option is configured in the config. If it is not, then I'm leaning to setting it to False by default so that there are at least some messages sent.

k3a commented 1 week ago

Are you sure?

The documentation defines:

critical_files

    Describes the files that must be unconditionally present. If timeout is reached
    and one or more critical files are missing, no message is published and
    all further processing ceases. 

If I understand the concept correctly, if someone wants "at least some messages sent" then they can simply not define critical_files option or am I missing something?

In YAML each set can have this flag is_critical_set specified separately like in this example https://github.com/pytroll/pytroll-collectors/blob/acc2631f894fc2361f1ca4a83166fedcb2db5b10/examples/segment_gatherer_msg_and_iodc.yaml_template#L13 but in INI this is not possible so it seems strange to me that it should be False by default. Isn't this making critical_files in INI not working and simply useless?

Of course, I can migrate to YAML, no big deal but I am confused.

mraspaud commented 1 week ago

The Ini format support was the first we implemented, but we later moved to yaml. So IMO the ini support should be deprecated. My impression is that we never bothered implementing the critical set for ini, but if you need it, feel free to provide a pull request!

gerritholl commented 1 week ago

I wrote that part of the documentation based on my understanding of until-then undocumented behaviour. It is possible that I misunderstood and that the documentation is incorrect.