xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
201 stars 20 forks source link

`xcube gen` fails to read netcdf with flag with only one meaning #906

Open tiagoams opened 1 year ago

tiagoams commented 1 year ago

Describe the bug xcube gen fails with an error when generating a datacube from netcdf file with a single flag. The default processor is being used.

This is the structure of the offending file:

dsorig = xr.open_dataset(datadir / "19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc",engine="netcdf4")
print(dsorig['flags'])

<xarray.DataArray 'flags' (time: 1, lat: 4416, lon: 5664)>
[25012224 values with dtype=int8]
Coordinates:
  * time     (time) datetime64[ns] 1998-01-01
  * lat      (lat) float32 65.99 65.98 65.97 65.96 ... 20.04 20.03 20.02 20.01
  * lon      (lon) float32 -45.99 -45.98 -45.97 -45.96 ... 12.97 12.98 12.99
Attributes:
    standard_name:          status_flag
    coverage_content_type:  auxiliaryInformation
    long_name:              Flags
    flag_meanings:          LAND
    flag_masks:             1
    valid_min:              0
    valid_max:              1

To Reproduce xcube, version 1.2.0 installed from conda-forge

  1. Download file from CMEMS: ftp://my.cmems-du.eu/Core/OCEANCOLOUR_ATL_BGC_L3_MY_009_113/cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D/1998/01/19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc
  2. Create configuration file gen_cmems_009_113_optics.yml:
    
    input_processor: default

output_size: [3200,2304] output_region: [-15.5,42.5000000000,13,63.0000000000000]

output_path: test_optics_v2.zarr

output_writer: zarr

output_resampling: Nearest

output_variables:

processed_variables:

>xcube gen -c gen_cmems_009_113_optics.yml 19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc
C:\ProgramData\mambaforge\envs\xcube\Lib\site-packages\xcube\core\gen\gen.py:96: UserWarning: append_mode in gen_cube() is deprecated, time slices will now always be inserted, replaced, or appended.
  warnings.warn('append_mode in gen_cube() is deprecated, '
Internal error: object of type 'numpy.int8' has no len()

Expected behavior In this case, a zarr file should be created.

Additional context The cause is the flag variables with the attributes

    flag_meanings:          LAND
    flag_masks:             1

Adding a dummy attribute so that len(flags_meanings) or len(flags_masks)>1 solves the problem:

dsori = xr.open_dataset(datadir / "19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc",engine="netcdf4")
print(dsori['flags'])
dsori['flags'].attrs = {
    'standard_name': 'status_flag',
     'coverage_content_type': 'auxiliaryInformation',
     'long_name': 'Flags',
     'flag_meanings': 'LAND DUMMY',
     'flag_masks':  [1,2],
     'valid_min': 0,
     'valid_max': 1
}
dsori.to_netcdf('optics_flags_rmattrs.nc')
TonioF commented 1 year ago

Thanks for reporting. For the time being, I'd like to point you to xcube's plugin for cmems: xcube-cmems. Maybe you will find it simpler to use. It will definitely take off you the hassle to download files to your local computer first.

tiagoams commented 1 year ago

Thanks for the pointer. I had looked at this plugin but didn't see in the example or documentation how to generate a regrided cube from a cmems data store. Could you please advise?

from https://github.com/dcs4cop/xcube-cmems/blob/main/examples/notebooks/cmems_ds_example.ipynb

cmems_store =  CmemsDataStore()
ds = cmems_store.open_data('DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE','dataset:zarr:cmems',variable_names=['sea_surface_temperature'],time_range=('2022-01-01','2022-01-02'), bbox=[9, 53, 20, 62])