pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

Expected S1 dtype in datarray but got float64 #1452

Closed ocefpaf closed 7 years ago

ocefpaf commented 7 years ago

Not sure if the dataset is pathological or if the problem is in xarray. netCDF4 1.2.4 correctly returns dtype S1 but xarray 0.9.6 returns 'float64' and then fails to open the dataset. (I am also having issues loading this variable with netCDF4 >1.2.4.)

In [1]: import xarray as xr
        from netCDF4 import Dataset
        url = 'http://geoport.whoi.edu/thredds/dodsC/usgs/vault0/models/tides/vdatum_gulf_of_maine/adcirc54_38_orig.nc'
        nc = Dataset(url)
        ds = xr.open_dataset(url)

In [2]: nc.variables['tidenames'].dtype
Out[2]: dtype('S1')

In [3]: ds['tidenames'].dtype
Out[3]: dtype('float64')

In [4]: ds['tidenames']
Out[4]: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    691                 type_pprinters=self.type_printers,
    692                 deferred_pprinters=self.deferred_printers)
--> 693             printer.pretty(obj)
    694             printer.flush()
    695             return stream.getvalue()

lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    378                             if callable(meth):
    379                                 return meth(obj, self, cycle)
--> 380             return _default_pprint(obj, self, cycle)
    381         finally:
    382             self.end_group()

lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    493     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    494         # A user-provided repr. Find newlines and replace them with p.break_()
--> 495         _repr_pprint(obj, p, cycle)
    496         return
    497     p.begin_group(1, '<')

lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    691     """A pprint that just redirects to the normal repr function."""
    692     # Find newlines and replace them with p.break_()
--> 693     output = repr(obj)
    694     for idx,output_line in enumerate(output.splitlines()):
    695         if idx:

lib/python3.6/site-packages/xarray/core/common.py in __repr__(self)
     95 
     96     def __repr__(self):
---> 97         return formatting.array_repr(self)
     98 
     99     def _iter(self):

lib/python3.6/site-packages/xarray/core/formatting.py in array_repr(arr)
    384         summary.append(repr(arr.data))
    385     elif arr._in_memory or arr.size < 1e5:
--> 386         summary.append(short_array_repr(arr.values))
    387     else:
    388         summary.append(u'[%s values with dtype=%s]' % (arr.size, arr.dtype))

lib/python3.6/site-packages/xarray/core/dataarray.py in values(self)
    401     def values(self):
    402         """The array's data as a numpy.ndarray"""
--> 403         return self.variable.values
    404 
    405     @values.setter

lib/python3.6/site-packages/xarray/core/variable.py in values(self)
    327     def values(self):
    328         """The variable's data as a numpy.ndarray"""
--> 329         return _as_array_or_item(self._data)
    330 
    331     @values.setter

lib/python3.6/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    203     TODO: remove this (replace with np.asarray) once these issues are fixed
    204     """
--> 205     data = np.asarray(data)
    206     if data.ndim == 0:
    207         if data.dtype.kind == 'M':

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    425 
    426     def __array__(self, dtype=None):
--> 427         self._ensure_cached()
    428         return np.asarray(self.array, dtype=dtype)
    429 

lib/python3.6/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    422     def _ensure_cached(self):
    423         if not isinstance(self.array, np.ndarray):
--> 424             self.array = np.asarray(self.array)
    425 
    426     def __array__(self, dtype=None):

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    406 
    407     def __array__(self, dtype=None):
--> 408         return np.asarray(self.array, dtype=dtype)
    409 
    410     def __getitem__(self, key):

lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    480 
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

lib/python3.6/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    373     def __array__(self, dtype=None):
    374         array = orthogonally_indexable(self.array)
--> 375         return np.asarray(array[self.key], dtype=None)
    376 
    377     def __getitem__(self, key):

lib/python3.6/site-packages/xarray/conventions.py in __getitem__(self, key)
    365     def __getitem__(self, key):
    366         return mask_and_scale(self.array[key], self.fill_value,
--> 367                               self.scale_factor, self.add_offset, self._dtype)
    368 
    369     def __repr__(self):

lib/python3.6/site-packages/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype)
     61     """
     62     # by default, cast to float to ensure NaN is meaningful
---> 63     values = np.array(array, dtype=dtype, copy=True)
     64     if fill_value is not None and not np.all(pd.isnull(fill_value)):
     65         if getattr(fill_value, 'size', 1) > 1:

ValueError: could not convert string to float: 'STEADY '

I will try to investigate this later this week.

shoyer commented 7 years ago

It looks like this variable has an attribute _FillValue = -1, which isn't a valid string value:

In [18]: ds['tidenames'].encoding
Out[18]:
{'_FillValue': -1,
 'dtype': dtype('S1'),
 'original_shape': (38, 64),
 'source': 'http://geoport.whoi.edu/thredds/dodsC/usgs/vault0/models/tides/vdatum_gulf_of_maine/adcirc54_38_orig.nc'}

Potentially we could update xarray's CF conventions code to handle inconsistent dtypes and _FillValue attributes, or possibly this file should be fixed instead.

A work around is to set mask_and_scale=False:

In [19]: ds = xr.open_dataset(url, mask_and_scale=False)

In [21]: ds['tidenames']
Out[21]:
<xarray.DataArray 'tidenames' (ntides: 38)>
array([b'STEADY ', b'MN     ', b'SM     ', b'K1     ', b'O1     ', b'P1     ',
       b'Q1     ', b'SO1    ', b'MNS2   ', b'2MS2   ', b'N2     ', b'M2     ',
       b'2MN2   ', b'S2     ', b'K2     ', b'MSN2   ', b'2SM2   ', b'MO3    ',
       b'SO3/MK3', b'SK3    ', b'N4     ', b'3MS4   ', b'MN4    ', b'M4     ',
       b'3MN4   ', b'MS4    ', b'2MSN4  ', b'2NM6   ', b'2MN6   ', b'M6     ',
       b'MSN6   ', b'2MS6   ', b'2SM6   ', b'2(MN)8 ', b'3MN8   ', b'M8     ',
       b'2MSN8  ', b'3MS8   '],
      dtype='|S64')
Dimensions without coordinates: ntides
Attributes:
    long_name:      Tide Constituent
    missing_value:  -1
    standard_name:  tide_constituent
ocefpaf commented 7 years ago

Totally missed your answer here @shoyer. Thanks!

The workaround is fine and the _FillValue =-1 seems wrong to me. Pinging @rsignell-usgs who is know more about the conventions and was interested into this in the first place.

Closing this as I don't think anything is broken with xarray.