Open gerritholl opened 2 years ago
@gerritholl pin me when this is merged or mostly done so that I can test it on some of my use cases. In meantime, I am drafting a PR refactoring the CFWriter internals to then create:
scn.to_xarray_dataset()
)ZarrWriter
I will wait your PR before proceeding ....
Side note:
If the Satpy Scene is on a SwathDefinition and we use the CFWriter, no grid_mapping
attribute nor crs
coords information is attached to netCDF. Maybe you can enforce the correct behavior in the context of this PR.
Going further away, when resampling a Scene from AreaDef to a SwathDef, the grid_mapping
attribute should also be updated ...
@ghiggi I am putting the work for this PR on hold. For what I need to achieve, I can and will use https://github.com/pytroll/satpy/pull/2236 instead. Therefore, fixing https://github.com/pytroll/satpy/issues/2226 for the equirectangular projection is no longer a priority from my side.
I see that this PR seems stalled. If someone starts working on this again later, I want to note that the CF convention (CF-1.10) clearly states that:
_The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property grid mapping attributes as well as by crswkt.
Thus, if we should load a CRS from wkt only, it should not be done silently by from_cf() or load_cf_area(). I think in the case described above, the user should actively (e.g. using a keyword) bypass decoding the single-property CF attributes, and rather use the WKT encoding.
@TomLav I disagree with this.
If data producers omit the single-property grid mapping attributes in favour of the crs_wkt attribute
This is specific to data producers, not consumers. We are consumers in this case. If the WKT does not match the single-property CF attributes then that is a fault of the data producer or of the CF format as a whole. Edit: We are allowed to just use the WKT attribute as consumers.
I think the above point is pretty clear, but I'd also like to add that I strongly disagree with CF's choice of making these separate CF attributes and for making crs_wkt optional and supplemental. WKT is fully descriptive and the CF attributes are not. The CF attributes (and the CF standard) do not support every projection and the standard as a whole has to be updated to support new projections when they come along. Conversations in the xarray ecosystem and the OGC Zarr world have started leaning towards the idea of being CF-compatible-ish because of this and only using the CRS WKT attribute in their zarr standards.
Hei @djhoese, I am not unaware of the history and tensions between the different communities around the CF attributes and the WKT definitions. Maybe these things will converge with time.
I was not saying that our from_cf() should bark and not load the crs from WKT, but I do think that this should not happen silently either. I do not know if this should be controlled via a keyword or a warning, but I think there is value for a routine called from_cf() to have the option to strictly follow the CF convention. We can e.g. have a from_cf(strict=True|False).
This being said pyresample load_cf_area() only checks if the :grid_mapping_name attribute is present and has a valid value, it then diverts to pyproj.from_cf() the decoding of the crs attributes and/or wkt. If pyproj adopts a loose approach, pyresample will inherit it silently.
You also state that you disagree with a quote that I took from the CF convention, which is fine, but there is little point discussing the CF convention here.
You also state that you disagree with a quote that I took from the CF convention, which is fine, but there is little point discussing the CF convention here.
No, sorry. I disagree with your interpretation of the quote. The standard is saying that we can load from WKT without looking at the single-property attributes (as consumers of the data, not producers).
but I do think that this should not happen silently either.
Why not? The CF convention gives us the crs_wkt
attribute as an option. We can use it if it is there. There is no need to warn the user if we use it or don't use it. It is there to be used (and is more accurate).
to have the option to strictly follow the CF convention
We are following the CF convention by being a consumer of the data and using the crs_wkt
attribute. There is nothing against the CF convention by doing this.
I see. We disagree on the interpretation of the CF convention.
You understand from the convention that files where the CRS is only specified with wkt are CF compliant.
I understand the convention that the CRS must first be defined in their single-attribute form, then we can duplicate and enrich with WKT.
For reference, my interpretation stems from the sentence: _The crswkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. What follows with data producers does not change my interpretation of this first sentence.
I don't think we can settle this from Pytroll, this must have been discussed somewhere at CF, I will give this a look.
For the record: if it is so that CF allows a crs to only be defined by its WKT then of course from_cf() should silently do its job and load this. If CF does not allow this, I still think we should help the user and load the CRS, but we should also give the user the control to use non-strict checks. Do you agree?
You understand from the convention that files where the CRS is only specified with wkt are CF compliant.
I understand the convention that the CRS must first be defined in their single-attribute form, then we can duplicate and enrich with WKT.
Nope. I understand that PRODUCERS must always include the single-property attributes. The CRS WKT is supplemental.
BUT there is nothing saying you/we/pyresample, as CONSUMERS of the data, should not use the CRS WKT. If it is there, we can use it. The WKT will never be less accurate than the CF attributes. So if the CRS WKT and single-property attributes are both available, we should always use/prefer the WKT...as CONSUMERS.
For later reference, it seems that CF (as of CF-1.10) will not accept a crs variable defined only by its wkt description.
If the text of the convention is possibly ambiguous (see discussion above), the conformance document is clear (*):
_The grid mapping variables must have the grid_mapping_name attribute. The legal values for the grid_mappingname attribute are contained in Appendix F.
Which still leaves us all latitude as to what pyresample should do in such cases. As I stated above I believe it should try to find its way around and produce the area_def. But I wanted to clarify (at least for my own sake) what was CF-compliant or not(*).
I see that pyproj.CRS.to_cf() happily loads the CRS whether its definition uses only the WKT description (case B below), or both the CF-attribute and WKT descriptions (case A), or only the CF-attribute description (case C). The current behaviour of pyproj seems to be to load the CRS from crs_wkt if present (A and B) and decode the CF attributes only if there are no crs_wkt (case C). I think this is the behaviour you want for pyresample, @djhoese, so all good.
from copy import copy
from pyproj import CRS
a = area_def.crs.to_cf()
b = {'crs_wkt': a['crs_wkt']}
c = copy(a)
c.pop('crs_wkt')
crsA = CRS.from_cf(a)
crsB = CRS.from_cf(b)
crsC = CRS.from_cf(c)
print(crsA)
print(crsB)
print(crsC)
Thus, it should be easy for pyresample load_cf_area() to work smoothly in all 3 cases, we just need to decide how to warn/inform the user that we found a non-compliancy but marched on (in case B).
Turning this raise
into a warning might be all we need to do.
(*) of course the conformance document could be wrong and not be aligned with CF convention. But what counts to me is that the conformance checkers will implement the conformance document.
As mentioned on slack from our discussion yesterday and today, after looking at the pyresample CF code it is clear pyresample is doing a lot more validation of CF standards than I thought it was. In those cases it makes sense to have a strict
keyword argument (I'd prefer a default of False
I think) that essentially "guesses" at whatever information it needs (which variable is the x coordinate, which one the y, etc). I still didn't agree with the idea that the WKT should not be used as the primary source of CRS information if it exists. Meaning the single-property attributes would only be depended on if the crs_wkt
didn't exist. I don't think this is wrong in a CF-sense and I think it makes more sense as WKT will never be less accurate than the CF attributes. It is just an unfortunate situation that the CF standard has put us in. So in summary a strict
keyword argument that when True checks if there is only a crs_wkt
and raises an error would be OK with me. If False
then pyresample just tries its best to get the CRS information. In general we will depend on pyproj to parse the CF grid mapping variable and produce whatever the best CRS is, but we can have additional checks for CF validation when strict=True
.
I would be ok with changing that raise
to a warning.
Load an AreaDefinition from a CF NetCDF file when the grid mapping variable is encoding only the Coordinate Reference System (CRS) in Well Known Text (WKT), but lacks other metadata.
Work in progress.
git diff origin/main **/*py | flake8 --diff