xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
186 stars 17 forks source link

`xcube gen` fails to read netcdf with flag with only one meaning #906

Open tiagoams opened 11 months ago

tiagoams commented 11 months ago

Describe the bug xcube gen fails with an error when generating a datacube from netcdf file with a single flag. The default processor is being used.

This is the structure of the offending file:

dsorig = xr.open_dataset(datadir / "19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc",engine="netcdf4")
print(dsorig['flags'])

<xarray.DataArray 'flags' (time: 1, lat: 4416, lon: 5664)>
[25012224 values with dtype=int8]
Coordinates:
  * time     (time) datetime64[ns] 1998-01-01
  * lat      (lat) float32 65.99 65.98 65.97 65.96 ... 20.04 20.03 20.02 20.01
  * lon      (lon) float32 -45.99 -45.98 -45.97 -45.96 ... 12.97 12.98 12.99
Attributes:
    standard_name:          status_flag
    coverage_content_type:  auxiliaryInformation
    long_name:              Flags
    flag_meanings:          LAND
    flag_masks:             1
    valid_min:              0
    valid_max:              1

To Reproduce xcube, version 1.2.0 installed from conda-forge

  1. Download file from CMEMS: ftp://my.cmems-du.eu/Core/OCEANCOLOUR_ATL_BGC_L3_MY_009_113/cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D/1998/01/19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc
  2. Create configuration file gen_cmems_009_113_optics.yml:
    
    input_processor: default

output_size: [3200,2304] output_region: [-15.5,42.5000000000,13,63.0000000000000]

output_path: test_optics_v2.zarr

output_writer: zarr

output_resampling: Nearest

output_variables:

processed_variables:

>xcube gen -c gen_cmems_009_113_optics.yml 19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc
C:\ProgramData\mambaforge\envs\xcube\Lib\site-packages\xcube\core\gen\gen.py:96: UserWarning: append_mode in gen_cube() is deprecated, time slices will now always be inserted, replaced, or appended.
  warnings.warn('append_mode in gen_cube() is deprecated, '
Internal error: object of type 'numpy.int8' has no len()

Expected behavior In this case, a zarr file should be created.

Additional context The cause is the flag variables with the attributes

    flag_meanings:          LAND
    flag_masks:             1

Adding a dummy attribute so that len(flags_meanings) or len(flags_masks)>1 solves the problem:

dsori = xr.open_dataset(datadir / "19980101_cmems_obs-oc_atl_bgc-optics_my_l3-multi-1km_P1D.nc",engine="netcdf4")
print(dsori['flags'])
dsori['flags'].attrs = {
    'standard_name': 'status_flag',
     'coverage_content_type': 'auxiliaryInformation',
     'long_name': 'Flags',
     'flag_meanings': 'LAND DUMMY',
     'flag_masks':  [1,2],
     'valid_min': 0,
     'valid_max': 1
}
dsori.to_netcdf('optics_flags_rmattrs.nc')
TonioF commented 11 months ago

Thanks for reporting. For the time being, I'd like to point you to xcube's plugin for cmems: xcube-cmems. Maybe you will find it simpler to use. It will definitely take off you the hassle to download files to your local computer first.

tiagoams commented 11 months ago

Thanks for the pointer. I had looked at this plugin but didn't see in the example or documentation how to generate a regrided cube from a cmems data store. Could you please advise?

from https://github.com/dcs4cop/xcube-cmems/blob/main/examples/notebooks/cmems_ds_example.ipynb

cmems_store =  CmemsDataStore()
ds = cmems_store.open_data('DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE','dataset:zarr:cmems',variable_names=['sea_surface_temperature'],time_range=('2022-01-01','2022-01-02'), bbox=[9, 53, 20, 62])