pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.08k stars 298 forks source link

save_datasets doesn't work for tropomi_l2 data #1143

Closed zxdawn closed 4 years ago

zxdawn commented 4 years ago

Describe the bug When I used save_datasets to save tropomi scene to nc file, it failed.

To Reproduce

# Your code here
import os
import glob
from satpy.scene import Scene

s5p_nc_dir = 's5p_data'
f_s5p_pattern = os.path.join(s5p_nc_dir, '*20190725T*')
f_s5p = glob.glob(f_s5p_pattern)

s5p = Scene(f_s5p, reader='tropomi_l2')

vnames = ['nitrogendioxide_tropospheric_column',
          'cloud_pressure_crb']

s5p.load(vnames)
for vname in vnames:
    print(s5p[vname])

comp = dict(zlib=True, complevel=5)

s5p.save_datasets(filename='./output/test.nc',
                  datasets=vnames,
                  groups={'S5P': vnames},
                  exclude_attrs=['ancillary_variables'],
                  compute=True,
                  writer='cf',
                  engine='netcdf4',
                  compression=comp,
                  )

Expected behavior save_datasets could save all loaded variables.

Actual results

  File "/yin_raid/xin/github/satpy/satpy/scene.py", line 1359, in save_datasets
    return writer.save_datasets(datasets, compute=compute, **save_kwargs)
  File "/yin_raid/xin/github/satpy/satpy/writers/cf_writer.py", line 641, in save_datasets
    include_lonlats=include_lonlats, pretty=pretty, compression=compression)
  File "/yin_raid/xin/github/satpy/satpy/writers/cf_writer.py", line 519, in _collect_datasets
    assert_xy_unique(datas)
  File "/yin_raid/xin/github/satpy/satpy/writers/cf_writer.py", line 232, in assert_xy_unique
    raise ValueError('Datasets to be saved in one file (or one group) must have identical projection coordinates. '
ValueError: Datasets to be saved in one file (or one group) must have identical projection coordinates. Please group them by area or save them in separate files.

Environment Info:

Additional context The problem of ancillary_variables is mentioned in #1140 .

zxdawn commented 4 years ago

I've tried to convert scene to dataset and save manually:

ds = s5p.to_xarray_dataset()
comp = dict(zlib=True, complevel=9)
encoding = {var: comp for var in ds.data_vars}
ds.load().to_netcdf(path='./output/test.nc',
                    mode='w',
                    engine='netcdf4',
                    group='S5P',
                    encoding=encoding)

But, I got this error:

  File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/core/dataset.py", line 1554, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 1040, in to_netcdf
    _validate_attrs(dataset)
  File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 219, in _validate_attrs
    check_attr(k, v)
  File "/yin_raid/xin/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 209, in check_attr
    "files".format(value)
TypeError: Invalid value for attr: 2019-07-25 04:27:43 must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

It seems xarray doesn't support saving the datetime attrs created by satpy.

djhoese commented 4 years ago

If you read the error message in your original post it says that your areas are not the same for the DataArrays you are saving. Are these different resolutions? Could you print each of these DataArrays out and paste them here? What happens if you resample then try to use the cf writer?

The second error you were getting is exactly why the cfwriter exists. Satpy puts things in the .attrs that xarray doesn't want to convert (mainly datetime objects and AreaDefinitions).

zxdawn commented 4 years ago

@djhoese This is the result of print(s5p):

<xarray.DataArray 'nitrogendioxide_tropospheric_column' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    time       datetime64[ns] 2019-07-25
  * x          (x) int32 0 1 2 3 4 5 6 7 8 ... 442 443 444 445 446 447 448 449
  * y          (y) int64 1540 1541 1542 1543 1544 ... 1924 1925 1926 1927 1928
    latitude   (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    longitude  (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    crs        object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Attributes:
    start_time:                                            2019-07-25 04:27:43
    end_time:                                              2019-07-25 06:09:12
    ancillary_variables:                                   [<xarray.DataArray...
    long_name:                                             Tropospheric verti...
    file_type:                                             tropomi_l2
    multiplication_factor_to_convert_to_molecules_percm2:  6.02214e+19
    modifiers:                                             ()
    units:                                                 mol m-2
    coordinates:                                           ('longitude', 'lat...
    platform_shortname:                                    S5P
    name:                                                  nitrogendioxide_tr...
    standard_name:                                         troposphere_mole_c...
    file_key:                                              PRODUCT/nitrogendi...
    resolution:                                            None
    sensor:                                                TROPOMI
    area:                                                  Shape: (389, 450)\...
    calibration:                                           None
    polarization:                                          None
    level:                                                 None
<xarray.DataArray 'cloud_pressure_crb' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    crs      object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
    start_time:              2019-07-25 04:27:43
    end_time:                2019-07-25 06:09:12
    long_name:               Cloud optical centroid pressure
    file_type:               tropomi_l2
    modifiers:               ()
    source:
    units:                   Pa
    proposed_standard_name:  air_pressure_at_cloud_optical_centroid
    coordinates:             ('longitude', 'latitude')
    platform_shortname:      S5P
    name:                    cloud_pressure_crb
    file_key:                PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
    resolution:              None
    sensor:                  TROPOMI
    area:                    Shape: (389, 450)\nLons: <xarray.DataArray 'long...
    calibration:             None
    polarization:            None
    level:                   None
    ancillary_variables:     []

However, if I print the datas in assert_xy_unique, the nitrogendioxide_tropospheric_column has different dims and shape:

{'cloud_pressure_crb': <xarray.DataArray 'cloud_pressure_crb' (y: 389, x: 450)>
dask.array<copy, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    longitude  (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    latitude   (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Dimensions without coordinates: y, x
Attributes:
    coordinates:             ['longitude', 'latitude']
    end_time:                2019-07-25 06:09:12
    file_key:                PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
    file_type:               tropomi_l2
    long_name:               Cloud optical centroid pressure
    modifiers:               []
    platform_shortname:      S5P
    proposed_standard_name:  air_pressure_at_cloud_optical_centroid
    sensor:                  TROPOMI
    source:
    start_time:              2019-07-25 04:27:43
    units:                   Pa, 'nitrogendioxide_tropospheric_column': <xarray.DataArray 'nitrogendioxide_tropospheric_column' (time: 1, y: 389, x: 450)>
dask.array<broadcast_to, shape=(1, 389, 450), dtype=float32, chunksize=(1, 389, 450), chunktype=numpy.ndarray>
Coordinates:
  * time       (time) datetime64[ns] 2019-07-25
  * x          (x) int32 0 1 2 3 4 5 6 7 8 ... 442 443 444 445 446 447 448 449
  * y          (y) int64 1540 1541 1542 1543 1544 ... 1924 1925 1926 1927 1928
    latitude   (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    longitude  (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
Attributes:
    coordinates:                                           ['longitude', 'lat...
    end_time:                                              2019-07-25 06:09:12
    file_key:                                              PRODUCT/nitrogendi...
    file_type:                                             tropomi_l2
    long_name:                                             Tropospheric verti...
    modifiers:                                             []
    multiplication_factor_to_convert_to_molecules_percm2:  6.02214e+19
    platform_shortname:                                    S5P
    sensor:                                                TROPOMI
    standard_name:                                         troposphere_mole_c...
    start_time:                                            2019-07-25 04:27:43
    units:                                                 mol m-2}

Actually, I don't want to resample it, I need the exact original data.

djhoese commented 4 years ago

The x/y coordinate variables seem wrong for nitrogendioxide_tropospheric_column. They seem tobe integers and I'm guessing they are coming from the data file themselves. Satpy assumes that x/y coordinates are the metered coordinates in projection space. For polar-orbiter data like this there shouldn't be x/y coordinate variables. You could try dropping the x/y coordinates after loading and see if that helps the NetCDF generation at all.

Actually, I don't want to resample it, I need the exact original data.

I get that, but the point was to force the data to the same resolution/projection to prove it wasn't something else. But now that you've printed them out, that shouldn't be needed.

zxdawn commented 4 years ago

@djhoese Thanks! After dropping x and y coords of nitrogendioxide_tropospheric_column, it works now.

Here're information of cloud_pressure_crb and nitrogendioxide_tropospheric_column in the original tropomi_l2 file:

float cloud_pressure_crb(time=1, scanline=373, ground_pixel=450);
  :units = "Pa";
  :proposed_standard_name = "air_pressure_at_cloud_optical_centroid";
  :source = ;
  :long_name = "Cloud optical centroid pressure";
  :coordinates = "/PRODUCT/longitude /PRODUCT/latitude";
  :_FillValue = 9.96921E36f; // float
  :_ChunkSizes = 1U, 373U, 450U; // uint

float nitrogendioxide_tropospheric_column(time=1, scanline=373, ground_pixel=450);
  :units = "mol m-2";
  :standard_name = "troposphere_mole_content_of_nitrogen_dioxide";
  :long_name = "Tropospheric vertical column of nitrogen dioxide";
  :coordinates = "longitude latitude";
  :ancillary_variables = "nitrogendioxide_tropospheric_column_precision air_mass_factor_troposphere air_mass_factor_total averaging_kernel";
  :multiplication_factor_to_convert_to_molecules_percm2 = 6.02214E19f; // float
  :_FillValue = 9.96921E36f; // float
  :_ChunkSizes = 1U, 373U, 450U; // uint

It's strange that nitrogendioxide_tropospheric_column has x and y coords, but cloud_pressure_crb not.

And, I haven't checked other variables. Do I need to drop x and y for all variables for this case?

djhoese commented 4 years ago

And, I haven't checked other variables. Do I need to drop x and y for all variables for this case?

It depends where the x/y are coming from. I would say yes, they should be dropped by the file handler if they do not correspond to x/y meters.

zxdawn commented 4 years ago

See the modification in the PR: https://github.com/pytroll/satpy/pull/1139/commits/89709fca8379330fc6c74596bf4881007cd16421

zxdawn commented 4 years ago

After x and y are dropped, I still got that error for these two variables:

<xarray.DataArray 'nitrogendioxide_tropospheric_column' (y: 389, x: 450)>
dask.array<where, shape=(389, 450), dtype=float32, chunksize=(389, 450), chunktype=numpy.ndarray>
Coordinates:
    time       datetime64[ns] 2019-07-25
    latitude   (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    longitude  (y, x) float32 dask.array<chunksize=(389, 450), meta=np.ndarray>
    crs        object +proj=latlong +datum=WGS84 +ellps=WGS84 +type=crs
Dimensions without coordinates: y, x
Attributes:
    start_time:                                            2019-07-25 04:27:43
    end_time:                                              2019-07-25 06:09:12
    platform_shortname:                                    S5P
    name:                                                  nitrogendioxide_tr...
    modifiers:                                             ()
    file_key:                                              PRODUCT/nitrogendi...
    sensor:                                                TROPOMI
    coordinates:                                           ('longitude', 'lat...
    units:                                                 mol m-2
    resolution:                                            None
    long_name:                                             Tropospheric verti...
    standard_name:                                         troposphere_mole_c...
    ancillary_variables:                                   [<xarray.DataArray...
    file_type:                                             tropomi_l2
    multiplication_factor_to_convert_to_molecules_percm2:  6.02214e+19
    area:                                                  Shape: (389, 450)\...
    calibration:                                           None
    polarization:                                          None
    level:                                                 None
<xarray.DataArray 'array-8b30ddfb1076e284b90d1bb91c3b2c43' (y: 390, x: 451)>
dask.array<array, shape=(390, 451), dtype=float32, chunksize=(390, 451), chunktype=numpy.ndarray>
Dimensions without coordinates: y, x
Attributes:
    start_time:           2019-07-25 04:27:43
    end_time:             2019-07-25 06:09:12
    platform_shortname:   S5P
    modifiers:            ()
    file_key:             PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
    sensor:               TROPOMI
    level:                None
    wavelength:           None
    polarization:         None
    resolution:           None
    standard_name:        assembled_latitude_bounds
    calibration:          None
    file_type:            tropomi_l2
    name:                 assembled_lat_bounds
    ancillary_variables:  []

Update

This is caused by the different shape of bounds. I will fix this in tropomi_l2 reader.