Closed zxdawn closed 3 years ago
Just curious, why are you specifically loading for longitude
and latitude
?
Could you print out the latitude
DataArray? I'm curious what the coordinates, attributes, and dimensions are before trying to save it.
@djhoese Because longitude
and latitude
are original variables saved in the TROPOMI L2 data, we need to save them for later usage.
Here's some information about s5p['latitude']
and s5p['surface_albedo']
:
<xarray.DataArray 'latitude' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
time datetime64[ns] 2019-07-25
latitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
longitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Dimensions without coordinates: y, x
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
long_name: pixel center latitude
units: degrees_north
standard_name: latitude
valid_min: -90.0
valid_max: 90.0
bounds: /PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
name: latitude
file_type: tropomi_l2
file_key: PRODUCT/latitude
coordinates: ('longitude', 'latitude')
modifiers: ()
platform_shortname: S5P
sensor: tropomi
_satpy_id: DataID(name='latitude', modifiers=())
ancillary_variables: []
<xarray.DataArray 'surface_albedo' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
crs object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
units: 1
standard_name: surface_albedo
long_name: Surface albedo in the cloud product
coordinates: ('longitude', 'latitude')
radiation_wavelength: 758.0
name: surface_albedo
file_key: PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_albedo
file_type: tropomi_l2
modifiers: ()
platform_shortname: S5P
sensor: tropomi
area: Shape: (389, 450)\nLons: <xarray.DataArray 'longit...
_satpy_id: DataID(name='surface_albedo', modifiers=())
ancillary_variables: []
If you remove the coordinates
from the .attrs
of the lon/lats before saving, does it give you a different error?
del scn['longitude'].attrs['coordinates']
del scn['latitude'].attrs['coordinates']
Same error if I add del s5p['latitude'].attrs['coordinates']
.
What if you remove the coordinates
attribute from all loaded DataArrays? There may be a chance that you have to remove it from the SwathDefinition, but lets ignore that for now.
My guess, after looking at the code, is that the CF writer is looking at .attrs['coordinates']
to determine what coordinates to associate with each DataArray. This is causing 'longitude' and 'latitude' to be assigned as coordinates when the dimension names are actually 'y' and 'x' and that makes xarray unhappy. Since the error only happens when you explicitly ask for longitude and latitude to be loaded there must be some check "does this coordinate exist in the variables I know about" and it ignores the coordinate if it doesn't exist.
That said, you don't need to explicitly add the longitude and latitude variables to what you load because the CF writer will add them automatically from the SwathDefinition (I think).
Thanks for the explanation David.
This method works well: skipping the manual saving of lon/lat and getting the lon and lat by lon, lat = s5p['surface_albedo'].attrs['area'].get_lonlats()
.
It looks reasonable to me. Shall we close this issue?
I'd still like to know why the longitude and latitude datasets don't work. It seems like a bug in the CF writer.
@ninahakansson @sfinkens might have ideas
I guess this has something to do with the additional time
dimension of the lat/lon coordinates. For example (I'm trying to reproduce the behaviour of the link_coords
method here) this works well:
In [20]: a = xr.DataArray([[1,2], [3,4]], dims=('y', 'x'), coords={'x': [1,2], 'y': [3,4]})
In [21]: lon = a.copy()
In [22]: a['lon'] = lon
In [23]: a
Out[23]:
<xarray.DataArray (y: 2, x: 2)>
array([[1, 2],
[3, 4]])
Coordinates:
* x (x) int64 1 2
* y (y) int64 3 4
lon (y, x) int64 1 2 3 4
But with an additional time dimension it doesn't:
In [24]: lon = lon.expand_dims('time')
In [25]: a['lon'] = lon
...
ValueError: cannot add coordinates with new dimensions to a DataArray
But why are the lon/lat coordinates getting a time
dimension added to them? Or is that just a side effect of how xarray makes coordinates apply to all variables?
Oh, sorry. I thought the lats returned by the reader had an extra time dimension. But I misread the snippet above. It's just a scalar time coordinate, no extra dimension.
Aha, the CF writer is adding a time dimension if the dataset has a time
coordinate: https://github.com/pytroll/satpy/blob/master/satpy/writers/cf_writer.py#L539 . I don't know why. But I think dropping the time
coordinate from the lats/lons
lons = lons.drop('time')
should work around the problem for now.
The link_coords
method could be fixed by dropping any scalar coordinates [set(lon.coords.keys()).difference(lon.dims)
] before using them.
The link_coords method could be fixed by dropping any scalar coordinates [set(lon.coords.keys()).difference(lon.dims)] before using them.
Maybe not just scalar but any coordinates that aren't also dimensions? Right?
@djhoese Yes, any coordinates that aren't also dimensions could lead to the error.
BTW, I'm curious whether the time
dimension is useful for MultiScene
?
If we drop the time
dimension, will the MultiScene
still work?
The MultiScene
does not depend on or use a time
dimension at all. It only depends on the order of the Scenes you provide it. It has the possibility to create an xarray Dataset with a time dimension where it uses start_time
if no time-based dimension is available (if I remember correctly).
If I delete the time
coordinates, then the time_bnds
isn't available.
However, as discussed in #1246, we don't need time_bnds
at all.
Is it correct to create a PR to delete the time coordinates for TROPOMI data?
I had a quick test and found that we need to drop two things to make the save_datasets()
work well:
1) the time
coordinates of all DataArrays
2) coordinates
attributes of longitude
and latitude
If this is OK, I will create a PR.
Is it possible to rename things so they work better? Otherwise update CF writer to handle these? I'd prefer not to remove information from the reader unless it's redundant. Someone could be using it.
Here're some testing steps:
I add these:
## delete the longitude/latitude Coordinates for longitude/latitude variables
if ds_id['name'] in ['longitude', 'latitude']:
data = data.drop_vars(['time'])
And I got this error:
File "test_save.py", line 77, in <module>
s5p.save_datasets(filename='./output/test.nc')
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/scene.py", line 1065, in save_datasets
return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py", line 723, in save_datasets
res = dataset.to_netcdf(filename, engine=engine, group=group_name, mode='a', encoding=encoding,
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/core/dataset.py", line 1689, in to_netcdf
return to_netcdf(
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/backends/api.py", line 1107, in to_netcdf
dump_to_store(
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/backends/api.py", line 1142, in dump_to_store
variables, attrs = conventions.encode_dataset_coordinates(dataset)
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/conventions.py", line 806, in encode_dataset_coordinates
return _encode_coordinates(
File "/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/xarray/conventions.py", line 765, in _encode_coordinates
written_coords.update(attrs["coordinates"].split())
AttributeError: 'list' object has no attribute 'split'
I comment out the coordinates in yaml:
datasets:
latitude:
name: 'latitude'
file_type: tropomi_l2
file_key: 'PRODUCT/latitude'
#coordinates: [longitude, latitude]
standard_name: latitude
longitude:
name: 'longitude'
file_type: tropomi_l2
file_key: 'PRODUCT/longitude'
#coordinates: [longitude, latitude]
standard_name: longitude
Then, save_datasets()
works well and the NetCDF file looks like this:
netcdf test {
dimensions:
y = 389 ;
x = 450 ;
variables:
float latitude(y, x) ;
.........
float longitude(y, x) ;
................
float cloud_fraction_crb_nitrogendioxide_window(y, x) ;
.............
cloud_fraction_crb_nitrogendioxide_window:coordinates = "longitude latitude" ;
.................
link_coords
If I don't change the yaml and reader, just only add one print line above https://github.com/pytroll/satpy/blob/f2489726e0b64fbd4fe17a2ad36eaa2f1453f167/satpy/writers/cf_writer.py#L254
print(dataset[coord])
save_datasets()
works again with some warnings:
/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py:260: UserWarning: Coordinate "longitude" referenced by dataset cloud_fraction_crb_nitrogendioxide_window does not exist, dropping reference.
warnings.warn('Coordinate "{}" referenced by dataset {} does not exist, dropping reference.'.format(
/public/home/zhangxin/new/miniconda3/envs/s5p_tobac/lib/python3.8/site-packages/satpy/writers/cf_writer.py:260: UserWarning: Coordinate "latitude" referenced by dataset cloud_fraction_crb_nitrogendioxide_window does not exist, dropping reference.
warnings.warn('Coordinate "{}" referenced by dataset {} does not exist, dropping reference.'.format(
The NetCDF file looks like this:
netcdf test {
dimensions:
time = 1 ;
y = 389 ;
x = 450 ;
bnds_1d = 2 ;
variables:
int64 time(time) ;
time:bounds = "time_bnds" ;
time:standard_name = "time" ;
time:units = "days since 2019-07-25" ;
time:calendar = "proleptic_gregorian" ;
float cloud_fraction_crb_nitrogendioxide_window(y, x) ;
.............
float longitude(y, x, time) ;
............
float latitude(time, y, x) ;
...........
double time_bnds(time, bnds_1d) ;
I suppose the time dimension should be dropped, but don't know the correct way to add it to CF writer ...
In that last test with the print
, are you saying it suddenly works after printing the coordinates?
The summary of my opinion on how to fix this is: If there is something semi-standard in the tropomi data (like a time dimension) that fails when using the CF writer, then the CF writer should be changed to handle it.
Describe the bug I got this error which is similar with #1031 when I tried to save the loaded scene of TROPOMI NO2 L2 data to NetCDF file.
To Reproduce
This issue happens when some specific variables like
nitrogendioxide_stratospheric_column
,surface_albedo
andsurface_pressure
etc. are included in thevnames
with thelongitude
orlatitude
together. Note that this issue doesn't exist before #1357 is merged.Environment Info: