Closed gerritholl closed 3 years ago
Merging #112 (e8daef1) into main (874427e) will increase coverage by
0.46%
. The diff coverage is98.65%
.
@@ Coverage Diff @@
## main #112 +/- ##
==========================================
+ Coverage 94.63% 95.10% +0.46%
==========================================
Files 9 9
Lines 1977 2125 +148
==========================================
+ Hits 1871 2021 +150
+ Misses 106 104 -2
Flag | Coverage Δ | |
---|---|---|
unittests | 95.10% <98.65%> (+0.46%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
trollflow2/plugins/__init__.py | 91.82% <97.26%> (+1.74%) |
:arrow_up: |
trollflow2/tests/test_trollflow2.py | 99.39% <100.00%> (+0.05%) |
:arrow_up: |
trollflow2/launcher.py | 89.24% <0.00%> (+0.03%) |
:arrow_up: |
trollflow2/tests/test_launcher.py | 98.66% <0.00%> (+0.04%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 874427e...e8daef1. Read the comment docs.
On hold considering the question on the optimal approach here. As implemented as present, data are calculated twice, which almost doubles the total time to generate products. This is not acceptable.
Alternatives:
save_datasets
. Advantage: broadly applicable (validity checks covers not only AVHRR 3A/3B, but any bad/flagged data), and I've already partly implemented it. Disadvantage: channel data that aren't there are still resampled, which is a waste of time.covers()
plugin to calculate coverage per group of products sharing a start/end time.The second alternative seems cleaner, but also more work. The first alternative would also cover cases with excessive bad data other than absent channels, but I don't know if that use case is needed.
I think I need a combination of both. Although https://github.com/pytroll/satpy/pull/1653 works now and could be used in a new or adapted trollflow2 plugin, there are still remaining issues. The start and end times are still valid only for the original (set of) files read; they are not updated after resampling, and updating them after resampling would be somewhat involved. Secondly, some data files (such as hrpt_noaa19_20210423_1411_62891.l1b
) lie about the presence of 3A/3B, so even with this satpy PR, the data are claimed to be there all the time (according to the 3a/3b bit), but data contents are empty/fill value.
A solution could be a combined approach. First I do a check how present a certain channel is, using the new updated start_time
and end_time
attributes. Then, after or inside the save_datasets
plugin, I reject products with too much invalid data.
Despite the passing unit tests, this is not yet function. I don't understand how the unit tests pass, but the new log message reveals in practice "Found 0 unique start/end time pair(s) in scene". Needs work, next week.
I've given up on trying to use product-specific start_time
and end_time
as being too complicated and an insufficient solution. Currently, covers()
can be done before loading any datasets. A product-specific covers()
would need to be done after loading datasets, because it needs the start_time
and end_time
for each dataset. That means the plugin should be a separate function in additional to covers()
. But the start_time
and end_time
are not updated when resampling. Some products may not have a start_time
and end_time
. And even if I could get it to work, it wouldn't replace the issue with filtering, becauset he data source isn't always truthful about what channel (3A or 3B) is active.
So back to the content filtering solution, now in check_valid()
I'm replacing the dataset as stored in the product list by the results of its ..persist()
method, which should hopefully prevent double calculation. Nevertheless, it's still much slower to use check_valid()
than to not use it: for one example file (20210503135536_NOAA_19.hmf), it takes 01:02 to produce 36 files when skipping check_valid()
and 02:35 when using check_valid()
. I don't understand why.
With the latest commit, producing 36 files for 20210503135536_NOAA_19.hmf
takes 01:21. Without check_valid()
it takes 01:02. With the old check_valid()
from earlier commits it takes 02:35.
I don't know how to solve the CodeFactor and codecov/project issues. Any hints?
One solution might be to start having each plugin in a separate file
One solution might be to start having each plugin in a separate file
Good idea (I don't like functional code in __init__.py
anyway). Can we postpone that to another PR?
After the last three commits tests seem to pass still, but I'd like to let it run for a bit on a test system to make sure it's still doing what I expect.
Plugin to check validity of channel data before continuing processing. This will help us when AVHRR switches 3A/3B so we don't generate empty channels and products.
flake8 trollflow2