pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.6k stars 1.08k forks source link

Using min() with skipna=True #3290

Closed zxdawn closed 5 years ago

zxdawn commented 5 years ago

MCVE Code Sample

from datetime impo

rt datetime
import xarray as xr
import os

def read_data(f, composition, west, east, north, south):
    # read data
    ds = xr.open_dataset(f, group='PRODUCT')
    # subset to region
    index = ((ds.longitude > west) & (ds.longitude < east))
    ds = ds.where(index)
    # read composition
    data = ds[composition][0,:,:]
    data_units = data.units
    # read time
    t = ds['time_utc']
    st = datetime.strptime(str(t.min(skipna=True).values), '%Y-%m-%dT%H:%M:%S.%fZ')
    et = datetime.strptime(str(t.max(skipna=True).values), '%Y-%m-%dT%H:%M:%S.%fZ')

    # read lon and lat
    lon = data.coords['longitude']
    lat = data.coords['latitude']

    return lon, lat, data, data_units, st, et

datadir = '/xin/data/TROPOMI/GZ/bug'
os.chdir(datadir)
west = 112.5; east = 114.5; north = 24; south = 22.5;

f = 'S5P_NRTI_L2__O3_____20190825T053303_20190825T053803_09659_01_010107_20190825T061441.nc'
lon, lat, data, data_units, st, et = read_data(f, 'ozone_total_vertical_column',
                                                  west, east, north, south)

Problem Description

You can download the data from google drive. I get errors shown in details, even using skipna=True.

Traceback (most recent call last): File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/duck_array_ops.py", line 236, in f return func(values, axis=axis, **kwargs) File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/nanops.py", line 77, in nanmin 'min', dtypes.get_pos_infinity(a.dtype), a, axis) File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/nanops.py", line 69, in _nan_minmax_object data = dtypes.fill_value(value.dtype) if valid_count == 0 else data AttributeError: module 'xarray.core.dtypes' has no attribute 'fill_value' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "bug.py", line 31, in west, east, north, south) File "bug.py", line 16, in read_data st = datetime.strptime(str(t.min(skipna=True).values), '%Y-%m-%dT%H:%M:%S.%fZ') File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/common.py", line 25, in wrapped_func skipna=skipna, allow_lazy=True, **kwargs) File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/dataarray.py", line 1597, in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, **kwargs) File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/variable.py", line 1354, in reduce axis=axis, **kwargs) File "/public/software/anaconda/anaconda3/envs/behr/lib/python3.6/site-packages/xarray-0.11.3-py3.6.egg/xarray/core/duck_array_ops.py", line 249, in f raise NotImplementedError(msg) NotImplementedError: min is not available with skipna=False with the installed version of numpy; upgrade to numpy 1.12 or newer to use skipna=True or skipna=None

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.0.76-0.11-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: 1.0.21 cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: None matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 36.4.0 pip: 9.0.1 conda: None pytest: None IPython: None sphinx: None
max-sixty commented 5 years ago

Thanks for the issue @zxdawn . Did you try doing this?

NotImplementedError: min is not available with skipna=False with the installed version of numpy; upgrade to numpy 1.12

zxdawn commented 5 years ago

@max-sixty Actually, I'm using numpy = 1.13.1 and I need skipna= True. Don't understand the error it shows.

shoyer commented 5 years ago

I think this may have been fixed by https://github.com/pydata/xarray/pull/2924 (which removed the line with dtypes.fill_value(value.dtype) if valid_count == 0 else data)

Can you try upgrading to xarray 0.12.3?

zxdawn commented 5 years ago

@shoyer Thank. It works now. But, I get another question. This is the result of t = ds['time_utc']:

<xarray.DataArray 'time_utc' (time: 1, scanline: 357, ground_pixel: 450)>
array([[[nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan],
        ...,
        [nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan]]], dtype=object)
Coordinates:
  * scanline      (scanline) float64 1.0 2.0 3.0 4.0 ... 354.0 355.0 356.0 357.0
  * ground_pixel  (ground_pixel) float64 1.0 2.0 3.0 4.0 ... 448.0 449.0 450.0
  * time          (time) datetime64[ns] 2019-08-25
Attributes:
    long_name:  Time of observation as ISO 8601 date-time string

If I want to get the minimum value by t.min(skipna=True), I get the strange type:

<xarray.DataArray 'time_utc' ()>
array(<xarray.core.dtypes.AlwaysGreaterThan object at 0x7f96ac188550>,
      dtype=object)

Can't convert it to string by str(t.min(skipna=True)).

keewis commented 5 years ago

do you actually have any non-nan values in your array? From what I understand of how nanops work is that AlwaysGreaterThan should only be returned by min() if there are no non-nan values.

zxdawn commented 5 years ago

@keewis I tried to using np.isnan(t.values).all() to check whether it's all nan. But, I got this error:

    print (np.isnan(t.values).all())
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This is the type of t.values: <class 'numpy.ndarray'>

shoyer commented 5 years ago

For datetime64 arrays, use np.isnat() instead of isnan.

On Sat, Sep 7, 2019 at 7:39 PM Xin Zhang notifications@github.com wrote:

@keewis https://github.com/keewis I tried to using np.isnan(t.values).all() to check whether it's all nan. But, I got this error:

print (np.isnan(t.values).all())

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This is the type of t.values: <class 'numpy.ndarray'>

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3290?email_source=notifications&email_token=AAJJFVWLAIBI55TBPAXWLDLQIRQWHA5CNFSM4IUO4E7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6FGMMY#issuecomment-529163827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVWBG7JHMQIYCYB3IJ3QIRQWHANCNFSM4IUO4E7A .

zxdawn commented 5 years ago

@shoyer Thanks. It's not datetime64 arrays, this is the result of np.isnat(t):

  File "/public/software/anaconda/anaconda3/envs/python36/lib/python3.6/site-packages/xarray-0.12.3-py3.6.egg/xarray/core/arithmetic.py", line 69, in __array_ufunc__
    dask='allowed')
  File "/public/software/anaconda/anaconda3/envs/python36/lib/python3.6/site-packages/xarray-0.12.3-py3.6.egg/xarray/core/computation.py", line 969, in apply_ufunc
    keep_attrs=keep_attrs)
  File "/public/software/anaconda/anaconda3/envs/python36/lib/python3.6/site-packages/xarray-0.12.3-py3.6.egg/xarray/core/computation.py", line 217, in apply_dataarray_vfunc
    result_var = func(*data_vars)
  File "/public/software/anaconda/anaconda3/envs/python36/lib/python3.6/site-packages/xarray-0.12.3-py3.6.egg/xarray/core/computation.py", line 564, in apply_variable_ufunc
    result_data = func(*input_data)
TypeError: ufunc 'isnat' is only defined for datetime and timedelta.

I use pd.isnull(t).all() to check it, it works. Actually it's all nan. There's something wrong with the nc file, I will contact the data center. Thank you for all your help :)