Open pnuu opened 1 year ago
Merging #142 (ad7cbbe) into main (5f154a1) will decrease coverage by
0.68%
. The diff coverage is97.90%
.
@@ Coverage Diff @@
## main #142 +/- ##
==========================================
- Coverage 91.64% 90.96% -0.68%
==========================================
Files 27 29 +2
Lines 4115 4547 +432
==========================================
+ Hits 3771 4136 +365
- Misses 344 411 +67
Flag | Coverage Δ | |
---|---|---|
unittests | 90.96% <97.90%> (-0.68%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pytroll_collectors/segments.py | 93.14% <96.66%> (+0.64%) |
:arrow_up: |
pytroll_collectors/tests/test_segments.py | 100.00% <100.00%> (ø) |
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Totals | |
---|---|
Change from base Build 5067098725: | 0.0% |
Covered Lines: | 0 |
Relevant Lines: | 0 |
For reference, here are some message data structures I found from my production logs.
For segment gatherer I only found this structure:
dataset = {
"start_time": "2023-05-25T10:50:00",
"platform_name": "Meteosat-11",
"sensor": ["seviri"],
"dataset": [
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
},
...
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
}
],
}
Message type is dataset
. The same structure is present also when collecting e.g. VIIRS channel segments.
For the simple case of single-segment AVHRR data the geographic gatherer returns collection
messages such as
collection = {
"sensor": "avhrr",
"platform_name": "Metop-C",
"start_time": "2023-05-25T06:24:00",
"end_time": "2023-05-25T06:33:00",
"collection": [
{
"start_time": "2023-05-25T06:24:00",
"end_time": "2023-05-25T06:25:00",
"uri": "/data/oper/avhrr/ears/level0/AVHR_HRP_00_M03_20230525062400Z_20230525062500Z_N_O_20230525062820Z",
"uid": "AVHR_HRP_00_M03_20230525062400Z_20230525062500Z_N_O_20230525062820Z"
},
...
{
"start_time": "2023-05-25T06:32:00",
"end_time": "2023-05-25T06:33:00",
"uri": "/data/oper/avhrr/ears/level0/AVHR_HRP_00_M03_20230525063200Z_20230525063300Z_N_O_20230525063403Z",
"uid": "AVHR_HRP_00_M03_20230525063200Z_20230525063300Z_N_O_20230525063403Z"
}
]
}
For compact VIIRS data having two channel segments for a single time the collection
consists of dataset
s.
collection_of_datasets = {
"start_time": "2023-05-11T01:40:54.200000",
"end_time": "2023-05-11T01:50:51.500000",
"platform_name": "NOAA-20",
"sensor": ["viirs"],
"collection": [
{
"dataset": [
{
"uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230511_t0140542_e0142187_b28372_c20230511015204000213_eum_ops.h5",
"uid": "SVDNBC_j01_d20230511_t0140542_e0142187_b28372_c20230511015204000213_eum_ops.h5"
},
{
"uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230511_t0140542_e0142187_b28372_c20230511015212000170_eum_ops.h5",
"uid": "SVMC_j01_d20230511_t0140542_e0142187_b28372_c20230511015212000170_eum_ops.h5"
}
],
"start_time": "2023-05-11T01:40:54.200000",
"end_time": "2023-05-11T01:42:18.700000"
},
...
{
"dataset": [
{
"uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230511_t0149270_e0150515_b28372_c20230511015839000126_eum_ops.h5",
"uid": "SVDNBC_j01_d20230511_t0149270_e0150515_b28372_c20230511015839000126_eum_ops.h5"
},
{
"uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230511_t0149270_e0150515_b28372_c20230511015848000237_eum_ops.h5",
"uid": "SVMC_j01_d20230511_t0149270_e0150515_b28372_c20230511015848000237_eum_ops.h5"
}
],
"start_time": "2023-05-11T01:49:27",
"end_time": "2023-05-11T01:50:51.500000"
}
]
}
The multicollection
message type could be something like this:
multicollection = {
"start_times": ["2023-05-25T10:50:00", ... "2023-05-25T11:50:00"],
"end_times": [],
"platform_name": "Meteosat-11",
"sensor": ["seviri"],
"multicollection":
[
{
"start_time": "2023-05-25T10:50:00",
"platform_name": "Meteosat-11",
"sensor": ["seviri"],
"dataset": [
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
},
...
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
}
],
},
...
{
"start_time": "2023-05-25T11:50:00",
"platform_name": "Meteosat-11",
"sensor": ["seviri"],
"dataset": [
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251150-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251150-__"
},
...
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251150-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251150-__"
}
],
}
]
}
The lists start_times
and end_times
on the top level might help later on with sorting, data selection, or something. With my chosen path of using segment gatherer internals in the code, it is not possible to collect data from different streams. If the collection happened in a separate process by listening to multiple segment or geographic gatherers, we could get multicollection
s like this:
multicollection_2 = {
"start_times": ["2023-05-25T10:50:00", ..., "2023-05-06T21:52:10.300000"],
"end_times": [None, ..., "2023-05-06T21:53:34.800000"],
"platform_names": ["Meteosat-11", ..., "NOAA-20"],
"sensors": ["seviri", ..., "viirs"],
"multicollection":
[
{
"start_time": "2023-05-25T10:50:00",
"platform_name": "Meteosat-11",
"sensor": ["seviri"],
"dataset": [
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-PRO______-202305251050-__"
},
...
{
"uri": "/data/oper/seviri/rss/level1.5/H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__",
"uid": "H-000-MSG4__-MSG4_RSS____-_________-EPI______-202305251050-__"
}
],
},
...
{
"start_time": "2023-05-06T21:52:10.300000",
"end_time": "2023-05-06T21:53:34.800000",
"platform_name": "NOAA-20",
"sensor": ["viirs"]
"dataset": [
{
"uri": "/data/oper/viirs/ears/level1b/SVDNBC_j01_d20230506_t2152103_e2153348_b28312_c20230506220612000459_eum_ops.h5",
"uid": "SVDNBC_j01_d20230506_t2152103_e2153348_b28312_c20230506220612000459_eum_ops.h5"
},
...
{
"uri": "/data/oper/viirs/ears/level1b/SVMC_j01_d20230506_t2152103_e2153348_b28312_c20230506220623000658_eum_ops.h5",
"uid": "SVMC_j01_d20230506_t2152103_e2153348_b28312_c20230506220623000658_eum_ops.h5"
}
],
}
]
}
This structure could be used in collecting geo ring data for example, which could then be processed with Satpy MultiScene
in one go. Now that I think of it, this would need a completely different logic compared to the initial purpose of this PR (publish multiple scenes with the same time for example), so I'll go with the former.
As @mraspaud said in https://github.com/pytroll/pytroll-collectors/issues/140#issuecomment-1560735188 , I'll swap this collection type to temporal_collection
and start building the metadata collection.
This PR adds a way to collect metadata for multiple different configurable time slots and publish them in a single message.
Closes #140