pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.65k stars 1.09k forks source link

decode_cf not scaling and off-setting correctly #2583

Closed jjspergel closed 5 years ago

jjspergel commented 6 years ago

Problem description

I am importing atmospheric cloud fraction data from the PATMOSx OpenDap server (https://www.ncei.noaa.gov/thredds/satellite/avhrr-cloudprops-patmos-x.html). Cloud fraction is the percent of the pixel that the instruments flag as cloud, thus the output should be between 0 and 1. The data is stored unscaled as an int8. In order to retrieve the cloud fraction data, the data should be decoded as unscaled_data*scaling_factor + add_offset in order to be in the proper format. decode_cf does not seem to be using the algorithm in this way.

import xarray as xr
url = 'http://www.ncei.noaa.gov/thredds/dodsC/avhrr-patmos-x-cloudprops-noaa-asc-fc/files/2003/patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc'
# Decoding using decode_cf
ds=xr.open_dataset(url, decode_cf=True)
ds.cloud_fraction
<xarray.DataArray 'cloud_fraction' (time: 1, latitude: 1800, longitude: 3600)>
[6480000 values with dtype=float32]
Coordinates:
  * latitude   (latitude) float32 -89.947815 -89.84893 -89.74731 -89.64843 ...
  * longitude  (longitude) float32 -179.94507 -179.84619 -179.7473 ...
  * time       (time) datetime64[ns] 2003-01-01
Attributes:
    standard_name:  cloud_area_fraction
    long_name:      cloud fraction computed over a 3x3 pixel array at the nat...
    actual_range:   [0. 1.]
    valid_range:    [-127  127]
    scaled:         1
    _ChunkSizes:    [  1 361 361]

ds.cloud_fraction.plot()
#The cloud fraction data should be scaled to be a percentage between 0 and 1, but it is not. 

#Manually decoding cloud_fraction
ds=xr.open_dataset(url, decode_cf=False)

ds.cloud_fraction
<xarray.DataArray 'cloud_fraction' (time: 1, latitude: 1800, longitude: 3600)>
[6480000 values with dtype=int8]
Coordinates:
  * latitude   (latitude) int16 -32748 -32712 -32675 -32639 -32603 -32566 ...
  * longitude  (longitude) int16 -32757 -32739 -32721 -32703 -32685 -32666 ...
  * time       (time) float32 1041379200.0
Attributes:
    _Unsigned:      false
    standard_name:  cloud_area_fraction
    long_name:      cloud fraction computed over a 3x3 pixel array at the nat...
    coordinates:    latitude longitude
    actual_range:   [0. 1.]
    valid_range:    [-127  127]
    scale_factor:   0.003937008
    add_offset:     0.5
    _FillValue:     -128
    scaled:         1
    _ChunkSizes:    [  1 361 361]

cloud_fract_manual = ds.cloud_fraction * ds.cloud_fraction.scale_factor + ds.cloud_fraction.add_offset
cloud_fract_manual.plot()

decoded using decode_cf image

Expected Output

image

Decoded manually ds.cloud_fraction * ds.cloud_fraction.scale_factor + ds.cloud_fraction.add_offset

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.4.111+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.2 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.2 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.3 cartopy: 0.17.0 seaborn: None setuptools: 39.2.0 pip: 18.1 conda: 4.5.4 pytest: 3.9.1 IPython: 6.4.0 sphinx: None
dcherian commented 6 years ago

I can reproduce that on master. Thanks for the good example.

However, if I download the file, it seems to work OK. @jjspergel can you confirm that please?

EDIT: image

Also, note that the scale_factor and add_offset attributes aren't set in the first ds.cloud_fraction repr but is in ds.cloud_fraction.encoding

{'source': 'http://www.ncei.noaa.gov/thredds/dodsC/avhrr-patmos-x-cloudprops-noaa-asc-fc/files/2003/patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc',
 'original_shape': (1, 1800, 3600),
 'dtype': dtype('int8'),
 '_Unsigned': 'false',
 '_FillValue': 128,
 'scale_factor': 0.003937008,
 'add_offset': 0.5,
 'coordinates': 'latitude longitude'}
jjspergel commented 6 years ago

I can reproduce that on master. Thanks for the good example.

However, if I download the file, it seems to work OK. @jjspergel can you confirm that please?

EDIT: image

Also, note that the scale_factor and add_offset attributes aren't set in the first ds.cloud_fraction repr but is in ds.cloud_fraction.encoding

{'source': 'http://www.ncei.noaa.gov/thredds/dodsC/avhrr-patmos-x-cloudprops-noaa-asc-fc/files/2003/patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc',
 'original_shape': (1, 1800, 3600),
 'dtype': dtype('int8'),
 '_Unsigned': 'false',
 '_FillValue': 128,
 'scale_factor': 0.003937008,

 'add_offset': 0.5,
 'coordinates': 'latitude longitude'}

Yes @dcherian, when I downloaded it, and then imported into a notebook it seems to have decoded fine.

image

rabernat commented 6 years ago

The CF decoding fails using opendap but not from the downloaded file. Pretty weird.

dcherian commented 6 years ago

The problem is that attrs['_Unsigned'] = 'false' which always triggers UnsignedIntegerCoder https://github.com/pydata/xarray/blob/0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e/xarray/coding/variables.py#L269

That line expects unsigned to be bool and ends up casting the signed integer to an unsigned integer and things go haywire.

opendap is setting the _Unsigned attribute but it isn't present in the downloaded file.

ncdump -h http://www.ncei.noaa.gov/thredds/dodsC/avhrr-patmos-x-cloudprops-noaa-asc-fc/files/2003/patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc
    byte cloud_fraction(time, latitude, longitude) ;
        cloud_fraction:_Unsigned = "false" ;
        cloud_fraction:standard_name = "cloud_area_fraction" ;
        cloud_fraction:long_name = "cloud fraction computed over a 3x3 pixel array at the native resolution centered on this pixel" ;
        cloud_fraction:coordinates = "latitude longitude" ;
        cloud_fraction:actual_range = 0.f, 1.f ;
        cloud_fraction:valid_range = -127s, 127s ;
        cloud_fraction:scale_factor = 0.003937008f ;
        cloud_fraction:add_offset = 0.5f ;
        cloud_fraction:_FillValue = -128b ;
        cloud_fraction:scaled = 1b ;
        cloud_fraction:_ChunkSizes = 1, 361, 361 ;
ncdump -h patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc 
    byte cloud_fraction_uncertainty(time, latitude, longitude) ;
        cloud_fraction_uncertainty:long_name = "cloud fraction uncertainty computed over a 3x3 array" ;
        cloud_fraction_uncertainty:coordinates = "latitude longitude" ;
        cloud_fraction_uncertainty:actual_range = 0.f, 1.f ;
        cloud_fraction_uncertainty:valid_range = -127b, 127b ;
        cloud_fraction_uncertainty:scale_factor = 0.003937008f ;
        cloud_fraction_uncertainty:add_offset = 0.5f ;
        cloud_fraction_uncertainty:_FillValue = -128b ;
        cloud_fraction_uncertainty:scaled = 1b ;
dcherian commented 6 years ago

@jjspergel The single change in #2584 should fix your problem.

rabernat commented 6 years ago

A temporary workaround without updating xarray might be to just edit the attributes to remove the offending unsigned attribute before decoding cf?

Sent from my iPhone

On Nov 30, 2018, at 1:35 PM, Deepak Cherian notifications@github.com wrote:

@jjspergel The single change in #2584 should fix your problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dopplershift commented 6 years ago

@lesserwhirls @dennisHeimbigner Is there any reason to expect a difference between the downloaded file and the opendap view on TDS?

DennisHeimbigner commented 6 years ago

Very possibly. The first thing to look at is what opendap is sending:

http://www.ncei.noaa.gov/thredds/dodsC/avhrr-patmos-x-cloudprops-noaa-asc-fc/files/2003/patmosx_v05r03_NOAA-17_asc_d20030101_c20140314.nc.das