lesommer / oocgcm

oocgcm is a python library for the analysis of large gridded geophysical dataset.
http://oocgcm.rtfd.io
Apache License 2.0
39 stars 11 forks source link

Problem while reading a self-made netcdf file with io.return_xarray_mfdataset #32

Open simon3122 opened 8 years ago

simon3122 commented 8 years ago

Hello,

My problem is as follows: I want to read a self-made netcdf file with io.return_xarray_mfdataset. The netcdf header gives:

group: netcdf4 { dimensions: y = 3454 ; x = 5422 ; t = 1 ; variables: float nav_lat(y, x) ; nav_lat:axis = "Y" ; nav_lat:standard_name = "latitude" ; nav_lat:long_name = "Latitude" ; nav_lat:units = "degrees_north" ; nav_lat:nav_model = "grid_T" ; float nav_lon(y, x) ; nav_lon:axis = "X" ; nav_lon:standard_name = "longitude" ; nav_lon:long_name = "Longitude" ; nav_lon:units = "degrees_east" ; nav_lon:nav_model = "grid_T" ; double time_centered(t) ; time_centered:standard_name = "time" ; time_centered:long_name = "Time axis" ; time_centered:title = "Time" ; time_centered:time_origin = "1958-01-01 00:00:00" ; time_centered:bounds = "time_centered_bounds" ; time_centered:units = "seconds since 1958-01-01" ; time_centered:calendar = "gregorian" ; double t(t) ; t:axis = "T" ; t:standard_name = "time" ; t:long_name = "Time axis" ; t:title = "Time" ; t:time_origin = "1958-01-01 00:00:00" ; t:bounds = "time_counter_bounds" ; t:units = "seconds since 1958-01-01" ; t:calendar = "gregorian" ; float sossheig(t, y, x) ; sossheig:_FillValue = 0.f ; sossheig:long_name = "sea surface height" ; sossheig:units = "m" ; sossheig:online_operation = "average" ; sossheig:interval_operation = "40s" ; sossheig:interval_write = "5d" ; sossheig:coordinates = "nav_lon nav_lat time_centered" ; } // group netcdf4

My code is: from oocgcm.core import io chunks = (1727, 2711) xr_chunks_tmean = {'y': chunks[-2], 'x': chunks[-1], 't':1} vmean_xrt =io.return_xarray_mfdataset(filemean, chunks=xr_chunks_tmean)[vdict[vkey]['vname']][:] I get the error output ValueError: some chunks keys are not dimensions on this object: ['y', 'x', 't']

simon3122 commented 8 years ago

This problem was when reading a Netcdf4 file. But is not anymore when reading a Netcdf3 file, written with these options.

ds.to_netcdf(filenam, 'w', format='NETCDF3_64BIT' engine='scipy', encoding={ vkey2:{'dtype':'float32'}})

lesommer commented 8 years ago

Hi Simon, could you please look in more detail and let me know what is the vkey/vname that returns the exception ? thanks.

simon3122 commented 8 years ago

I am working on NEMO data with the following variables: vkey = 'sea level' vkey2 = vdict[vkey]['vname'] The dictionary is defined as follows: vdict['sea level']={'vname': 'sossheig', ...

If it can provide information, here are my options to output in Netcdf4 (which raises a problem at the following reading stage) ds.to_netcdf(filenam,'w',format='NETCDF4', engine='netcdf4', encoding={ vkey2:{'_FillValue':0,'dtype':'float32'}})

lesommer commented 8 years ago

I suspect this related to the options used in reading netcdf files in core.io. NB : these options have changed in the trunk (see)

  1. could you try to open your newly created dataset directly from xarray methods ?
  2. try without specifying chunk size for 't' dimension {'y': chunks[-2], 'x': chunks[-1]}
simon3122 commented 8 years ago

First, I noticed that I can not create the xarray.dataset: the error comes immediately from the open_mfdataset function

Here are my results:

  1. Replacing io.return_xarray_mfdataset(filemean,chunks=xr_chunks_tmean) with xr.open_mfdataset(filemean,chunks=xr_chunks_tmean,engine='netcdf4',lock=False)

    does not change the result

  2. Leaving out the 't' dimension provides this error: ValueError: some chunks keys are not dimensions on this object: ['y', 'x']

I can add another result : when specifying chunk=None in the xarray function, the dataset again can not be read properly: print xr.open_mfdataset(filemean,chunks=None,engine='netcdf4',lock=False) gives

<xarray.Dataset>
Dimensions:  ()
Coordinates:
    *empty*
Data variables:
    *empty*
simon3122 commented 8 years ago

The blocking version was when writing the file ds.to_netcdf(filenam,'w','NETCDF4', 'netcdf4'...}}) which gives the following header

group: netcdf4 {
  dimensions:
    y = 3454 ;
    x = 5422 ;

In fact, it works when writing the file ds.to_netcdf(filenam,'w',format='NETCDF4', engine='netcdf4'...}}) which gives the following header

dimensions:
    y = 3454 ;
    x = 5422 ;

The first version seems to create an inner netcdf4 group which might not be a standard Netcdf4 file. I understand that Xarray reading function may not read properly the first version (with an inner group) if the option 'group' is not filled.