Closed zxdawn closed 4 years ago
I've tried to convert scene
to dataset
and save manually:
ds = s5p.to_xarray_dataset()
comp = dict(zlib=True, complevel=9)
encoding = {var: comp for var in ds.data_vars}
ds.load().to_netcdf(path='./output/test.nc',
mode='w',
engine='netcdf4',
group='S5P',
encoding=encoding)
But, I got this error:
File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/core/dataset.py", line 1554, in to_netcdf
invalid_netcdf=invalid_netcdf,
File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 1040, in to_netcdf
_validate_attrs(dataset)
File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 219, in _validate_attrs
check_attr(k, v)
File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 209, in check_attr
"files".format(value)
TypeError: Invalid value for attr: 2019-07-25 04:27:43 must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files
It seems xarray
doesn't support saving the datetime attrs created by satpy.
If you read the error message in your original post it says that your areas are not the same for the DataArrays you are saving. Are these different resolutions? Could you print each of these DataArrays out and paste them here? What happens if you resample then try to use the cf writer?
The second error you were getting is exactly why the cf
writer exists. Satpy puts things in the .attrs
that xarray doesn't want to convert (mainly datetime
objects and AreaDefinitions).
@djhoese
This is the result of print(s5p)
:
<xarray.DataArray 'nitrogendioxide_tropospheric_column' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
time datetime64[ns] 2019-07-25
* x (x) int32 0 1 2 3 4 5 6 7 8 ... 442 443 444 445 446 447 448 449
* y (y) int64 1540 1541 1542 1543 1544 ... 1924 1925 1926 1927 1928
latitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
longitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
crs object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
ancillary_variables: [<xarray.DataArray...
long_name: Tropospheric verti...
file_type: tropomi_l2
multiplication_factor_to_convert_to_molecules_percm2: 6.02214e+19
modifiers: ()
units: mol m-2
coordinates: ('longitude', 'lat...
platform_shortname: S5P
name: nitrogendioxide_tr...
standard_name: troposphere_mole_c...
file_key: PRODUCT/nitrogendi...
resolution: None
sensor: TROPOMI
area: Shape: (389, 450)\...
calibration: None
polarization: None
level: None
<xarray.DataArray 'cloud_pressure_crb' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
crs object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
long_name: Cloud optical centroid pressure
file_type: tropomi_l2
modifiers: ()
source:
units: Pa
proposed_standard_name: air_pressure_at_cloud_optical_centroid
coordinates: ('longitude', 'latitude')
platform_shortname: S5P
name: cloud_pressure_crb
file_key: PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
resolution: None
sensor: TROPOMI
area: Shape: (389, 450)\nLons: <xarray.DataArray 'long...
calibration: None
polarization: None
level: None
ancillary_variables: []
However, if I print the datas
in assert_xy_unique, the nitrogendioxide_tropospheric_column
has different dims and shape:
{'cloud_pressure_crb': <xarray.DataArray 'cloud_pressure_crb' (y: 389, x: 450)>
dask.array<copy, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
longitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
latitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Dimensions without coordinates: y, x
Attributes:
coordinates: ['longitude', 'latitude']
end_time: 2019-07-25 06:09:12
file_key: PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
file_type: tropomi_l2
long_name: Cloud optical centroid pressure
modifiers: []
platform_shortname: S5P
proposed_standard_name: air_pressure_at_cloud_optical_centroid
sensor: TROPOMI
source:
start_time: 2019-07-25 04:27:43
units: Pa, 'nitrogendioxide_tropospheric_column': <xarray.DataArray 'nitrogendioxide_tropospheric_column' (time: 1, y: 389, x: 450)>
dask.array<broadcast_to, shape=(1, 389, 450), dtype=float32, chunksize=(1, 389, 450), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2019-07-25
* x (x) int32 0 1 2 3 4 5 6 7 8 ... 442 443 444 445 446 447 448 449
* y (y) int64 1540 1541 1542 1543 1544 ... 1924 1925 1926 1927 1928
latitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
longitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Attributes:
coordinates: ['longitude', 'lat...
end_time: 2019-07-25 06:09:12
file_key: PRODUCT/nitrogendi...
file_type: tropomi_l2
long_name: Tropospheric verti...
modifiers: []
multiplication_factor_to_convert_to_molecules_percm2: 6.02214e+19
platform_shortname: S5P
sensor: TROPOMI
standard_name: troposphere_mole_c...
start_time: 2019-07-25 04:27:43
units: mol m-2}
Actually, I don't want to resample it, I need the exact original data.
The x/y coordinate variables seem wrong for nitrogendioxide_tropospheric_column
. They seem tobe integers and I'm guessing they are coming from the data file themselves. Satpy assumes that x/y coordinates are the metered coordinates in projection space. For polar-orbiter data like this there shouldn't be x/y coordinate variables. You could try dropping the x/y coordinates after loading and see if that helps the NetCDF generation at all.
Actually, I don't want to resample it, I need the exact original data.
I get that, but the point was to force the data to the same resolution/projection to prove it wasn't something else. But now that you've printed them out, that shouldn't be needed.
@djhoese Thanks! After dropping x
and y
coords of nitrogendioxide_tropospheric_column
, it works now.
Here're information of cloud_pressure_crb
and nitrogendioxide_tropospheric_column
in the original tropomi_l2 file:
float cloud_pressure_crb(time=1, scanline=373, ground_pixel=450);
:units = "Pa";
:proposed_standard_name = "air_pressure_at_cloud_optical_centroid";
:source = ;
:long_name = "Cloud optical centroid pressure";
:coordinates = "/PRODUCT/longitude /PRODUCT/latitude";
:_FillValue = 9.96921E36f; // float
:_ChunkSizes = 1U, 373U, 450U; // uint
float nitrogendioxide_tropospheric_column(time=1, scanline=373, ground_pixel=450);
:units = "mol m-2";
:standard_name = "troposphere_mole_content_of_nitrogen_dioxide";
:long_name = "Tropospheric vertical column of nitrogen dioxide";
:coordinates = "longitude latitude";
:ancillary_variables = "nitrogendioxide_tropospheric_column_precision air_mass_factor_troposphere air_mass_factor_total averaging_kernel";
:multiplication_factor_to_convert_to_molecules_percm2 = 6.02214E19f; // float
:_FillValue = 9.96921E36f; // float
:_ChunkSizes = 1U, 373U, 450U; // uint
It's strange that nitrogendioxide_tropospheric_column
has x
and y
coords, but cloud_pressure_crb
not.
And, I haven't checked other variables. Do I need to drop x
and y
for all variables for this case?
And, I haven't checked other variables. Do I need to drop x and y for all variables for this case?
It depends where the x/y are coming from. I would say yes, they should be dropped by the file handler if they do not correspond to x/y meters.
See the modification in the PR: https://github.com/pytroll/satpy/pull/1139/commits/89709fca8379330fc6c74596bf4881007cd16421
After x
and y
are dropped, I still got that error for these two variables:
<xarray.DataArray 'nitrogendioxide_tropospheric_column' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
time datetime64[ns] 2019-07-25
latitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
longitude (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
crs object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
platform_shortname: S5P
name: nitrogendioxide_tr...
modifiers: ()
file_key: PRODUCT/nitrogendi...
sensor: TROPOMI
coordinates: ('longitude', 'lat...
units: mol m-2
resolution: None
long_name: Tropospheric verti...
standard_name: troposphere_mole_c...
ancillary_variables: [<xarray.DataArray...
file_type: tropomi_l2
multiplication_factor_to_convert_to_molecules_percm2: 6.02214e+19
area: Shape: (389, 450)\...
calibration: None
polarization: None
level: None
<xarray.DataArray 'array-8b30ddfb1076e284b90d1bb91c3b2c43' (y: 390, x: 451)>
dask.array<array, shape=(390, 451), dtype=float32, chunksize=(390, 451), chunktype=numpy.ndarray>
Dimensions without coordinates: y, x
Attributes:
start_time: 2019-07-25 04:27:43
end_time: 2019-07-25 06:09:12
platform_shortname: S5P
modifiers: ()
file_key: PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
sensor: TROPOMI
level: None
wavelength: None
polarization: None
resolution: None
standard_name: assembled_latitude_bounds
calibration: None
file_type: tropomi_l2
name: assembled_lat_bounds
ancillary_variables: []
This is caused by the different shape of bounds. I will fix this in tropomi_l2 reader.
Describe the bug When I used
save_datasets
to save tropomi scene to nc file, it failed.To Reproduce
Expected behavior
save_datasets
could save all loaded variables.Actual results
Environment Info:
Additional context The problem of
ancillary_variables
is mentioned in #1140 .