Open pnuu opened 1 year ago
I didn't know there existed standardised message types with defined data structures. Is this defined/documented and/or enforced/tested anywhere?
Should there be new message type like library (file -> dataset -> collection -> library 😜) or something that has a list named library with collections with datasets inside?
For what it's worth, in one software package I know the seven dimensions are called Library, Vitrine, Shelf, Book, Page, Row, Column :-)
On a more serious note, if we do use standardised names and a collection collects all granules or segments belonging to a single scene, then "multicollection" would be I think quite clear in its purpose.
I didn't know there existed standardised message types with defined data structures. Is this defined/documented and/or enforced/tested anywhere?
I doubt it's documented anywhere. I was thinking the same earlier today. But the above is most of what we have in use in posttroll-based packages. The file
message type is the most common. Segment gatherers uses dataset
if it receives files, collection
if it receives datasets. Geographic collector always publishes collection
messages. There are some other types at least in Trollmoves (ack
, push
, error
, pong
, err
, unknown
show up in a quick grep
) for internal communications.
On a more serious note, if we do use standardised names and a collection collects all granules or segments belonging to a single scene, then "multicollection" would be I think quite clear in its purpose.
I like that, the data are most likely passed to MultiScene
in Satpy, so that'd match.
I'm thinking that the difference between collection
and mulitcollection
is not really obvious, while temporal_collection
is more explicit...
Thanks, I'll think about the naming. I've started with multicollection
also for the internals, but changing that isn't too complicated.
For creation of multi-temporal datasets data need to be collected and published for multiple time slots.
As an example, https://github.com/pytroll/satpy/pull/2488 needs three distinct datasets:
The time-shift between the datasets can be anything, for example 15/30/60 minutes. It can even be irregular if used for polar satellite data or emphasis is needed on one direction or the other.
There are other envisioned needs for this kind of collection/publishing, so the feature needs to be kept as flexible as possible.
Messages
Currently we have the following message types for publishing data:
file
: plain json without nested lists nor dictionaries, everything at the "top level" of the messagedataset
: combined metadata (start/end times, platform, and such) at the top level, and a list nameddataset
of dictionaries having URI and UID of individual filescollection
: same as above, but there is a list namedcollection
with dictionaries of individual start/end times anddataset
sThe
collection
message type could be used for the collection of multi-temporal data that described here, but how to distinguish from the existing usage? Should there be new message type likelibrary
(file -> dataset -> collection -> library :stuck_out_tongue_winking_eye:) or something that has a list namedlibrary
withcollection
s withdataset
s inside?Configuration
This is the first crude idea of how to configure which data are published together. The publishing would be triggered after each data collection has terminated.
The min/max ages are relative to the start time of the currently completed collection. Just having the
0/0
combination would equal the current behaviour of publishing the latest completed set. If all the criteria are not met (just after restart, for example, we might not have the earlier slots collected).Internals
Currently the completed
Slot
s are deleted. We need to add a new check that looks at thepublished_slots
config (andtimeliness
?) to determine which slots are not needed anymore. As the keys in theself.slots
dictionary are the nominal or start time (possibly rounded, depending on config) of the slot as a string, comparison is quite easy.