Closed dnerini closed 3 years ago
Spontaneous questions:
Spontaneous responses:
* should we use "precipitation" to name the main variable? Can we make it more general so that to make it clear that other variables can be used (e.g. cloud cover)?
at this point I believe that "precipitation" is a good choice, but I agree that we may have other cases such "rain_rate"
* we should revise the metadata to better align them to CF standards and other libraries used for radar data processing (e.g. wradlib).
Yes ... it is better to follow CF standards as xarray use them as well
but I would prefer to keep key metadata in the structure as parameters of the variable ... because those metadata that initially would easily derived from the data may be lost after a number of data transformations, or they could not be that easy to derive, for example if radar field is full of non zero_values then how the zero_value can be derived?
OK, for the moment I decided to return a DataArray instead of a Dataset (hence ignoring the quality field, which we never used anyway). This way, we remain agnostic in what variable we are reading (could be anything).
<xarray.DataArray (y: 640, x: 710)>
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
* x (x) float64 2.555e+05 2.565e+05 2.575e+05 ... 9.635e+05 9.645e+05
* y (y) float64 -1.595e+05 -1.585e+05 ... 4.785e+05 4.795e+05
Attributes:
unit: mm
accutime: 5.0
transform: None
zerovalue: 0.0
threshold: 0.0009628129986471908
zr_a: 316.0
zr_b: 1.5
institution: MeteoSwiss
projection: +proj=somerc +lon_0=7.43958333333333 +lat_0=46.95240555555...
I was thinking that it would make more sense if the DataArray would include a singleton "t" dimension with the timestamp of the radar image, what you think?
my personal preference is to use Dataset from source input as stores together multiple variables, like 'precipitation' and 'projection' information. The 'main/interset' variable always can be selected from the Dataset before to pass into pySTEPS routines.
Also, "time" should be a coordinate as that is what really differentiate one radar field from other one (assuming that all of them share common metadata for all time steps) ... also using 'time' as coordinate helps to open and concatenate multiple radar files using xarray.open_mfdataset
I see your point concerning the dataset, @cvelascof, and this could be easily implemented for importers. I just wonder the need for datasets when we mostly work with single variables (radar precip). One can always build a new dataset when needed (adding NWP data for example).
And do we need an extra variable for the projection? I've seen it done before for netcdfs, but we could try to simplify this by adopting EPSG codes, at which point it would be enough to add a crs=EPSG:xxxx attribute. What do you think?
also most of our methods work on single arrays, and thus the migration on xarrays would be easier on DataArrays. Also, we shouldn't make any assumption on the name of the variables.
One possible solution could be to return a dataset when importing but then the user would have to pass the variable of interest to a given method, for example:
ds = pysteps.io.read_timeseries() # or xr.open_mfdataset()
precip = ds.precip.pysteps.to_rainrate()
motion = pysteps.motion.dense_lucaskanade(precip)
That could work, @dnerini (thanks for all the work, by the way)! We only should not forget to change that in the examples in that case (to avoid confusion). @cvelascof, does projection need to be a variable? I.e., can't it just be an attribute as in @dnerini's example?
I find it hard to judge whether we need to move to DataArrays or -Sets. As @dnerini pointed out, we generally work with single arrays. What I can imagine with our move towards blending and perhaps later machine learning techniques, is that we want to involve (at some point..) more variables than just precipitation. If the information originates from one dataset, e.g. the NWP model, it may be a bit double to import that in multiple DataArrays that contain similar grid, time and metadata information. However, either way will work and I think we just have to decide on what we see as both a clearer and cleaner method. I personally have no preference.
This PR is by no means definitive, but should be seen instead as a first step to integrate the xarray data model into pysteps. I suggest we merge it pretty soon and advance in other modules, knowing that we will anyway have to come back to the io to make more adjustments before V2 will be ready.
Great work @dnerini! I agree that we should merge it soon and advance in other modules.
I pushed two commits that add a temporary "legacy" option to allow the importers and readers to behave as in version 1. Also, the tests that are still written for v1, were updated to use this new legacy option. The tests that couldn't be easily fix were manually skipped and tagged with a TODO message. This legacy option will allow maintaining the testing functionality through the migration to xarray.
Ah very nice @aperezhortal , thanks! Looks like tests are now failing because of missing dependencies. I'll cherry-pick commit b4c165ade1f31fb7567457a6bd95686702d8f70a to fix it as soon as possible (should have some time this afternoon).
Merging #219 (15d5b5b) into pysteps-v2 (c8ebc2d) will decrease coverage by
8.33%
. The diff coverage is96.55%
.
@@ Coverage Diff @@
## pysteps-v2 #219 +/- ##
==============================================
- Coverage 79.84% 71.51% -8.34%
==============================================
Files 137 139 +2
Lines 9929 9988 +59
==============================================
- Hits 7928 7143 -785
- Misses 2001 2845 +844
Flag | Coverage Δ | |
---|---|---|
unit_tests | 71.51% <96.55%> (-8.34%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pysteps/io/readers.py | 81.25% <72.72%> (+7.05%) |
:arrow_up: |
pysteps/tests/helpers.py | 87.50% <87.50%> (-6.62%) |
:arrow_down: |
pysteps/decorators.py | 99.28% <100.00%> (+0.14%) |
:arrow_up: |
pysteps/io/importers.py | 71.77% <100.00%> (+0.12%) |
:arrow_up: |
pysteps/tests/test_cascade.py | 47.22% <100.00%> (-52.78%) |
:arrow_down: |
pysteps/tests/test_datasets.py | 45.45% <100.00%> (-26.64%) |
:arrow_down: |
pysteps/tests/test_detcatscores.py | 94.73% <100.00%> (-5.27%) |
:arrow_down: |
pysteps/tests/test_exporters.py | 31.91% <100.00%> (-68.09%) |
:arrow_down: |
pysteps/tests/test_io_archive.py | 100.00% <100.00%> (ø) |
|
pysteps/tests/test_io_bom_rf3.py | 100.00% <100.00%> (ø) |
|
... and 35 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c8ebc2d...15d5b5b. Read the comment docs.
Include new xarray-based data model into the pysteps.io module (See #12).
The imported data are converted into an xarray Dataset by means of a decorator.
See below for an example using MeteoSwiss data.
the same for a BOM file:
edit: limit the scope of this PR to the io module only.