pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

open_rasterio does not read coordinates from netCDF file properly with netCDF4>=1.4.2 #3185

Closed weiji14 closed 3 years ago

weiji14 commented 5 years ago

MCVE Code Sample

Adapted from the test_serialization unit test at here.

import xarray as xr
from xarray.tests.test_backends import assert_identical, create_tmp_geotiff, create_tmp_file

with create_tmp_geotiff(additional_attrs={}) as (tmp_file, expected):
    # write it to a netcdf and read again (roundtrip) using open_rasterio
    with xr.open_rasterio(tmp_file) as rioda:
        with create_tmp_file(suffix='.nc') as tmp_nc_file:
            # Write geotiff to netcdf file
            rioda.to_netcdf(tmp_nc_file)

            # Read using open_dataarray works
            with xr.open_dataarray(tmp_nc_file) as ncds:
                assert_identical(rioda, ncds)

            # Read using open_rasterio doesn't work!!
            with xr.open_rasterio(tmp_nc_file) as ncds:
                assert_identical(rioda, ncds)

Actual Output (using netCDF4>=1.4.2)

AssertionError: Left and right DataArray objects are not identical

Differing coordinates:
L * x        (x) float64 5.5e+03 6.5e+03 7.5e+03 8.5e+03
R * x        (x) float64 0.5 1.5 2.5 3.5
L * y        (y) float64 7.9e+04 7.7e+04 7.5e+04
R * y        (y) float64 0.5 1.5 2.5
Differing attributes:
L   transform: (1000.0, 0.0, 5000.0, 0.0, -2000.0, 80000.0)
R   transform: (1.0, 0.0, 0.0, 0.0, 1.0, 0.0)
L   res: (1000.0, 2000.0)
R   res: (1.0, -1.0)
Attributes only on the left object:
    crs: +init=epsg:32618

Expected Output (using netCDF4==1.4.1)

AssertionError: Left and right DataArray objects are not identical

Differing attributes:
L   nodatavals: (nan, nan, nan)
R   nodatavals: (nan, nan, nan)
Attributes only on the left object:
    crs: +init=epsg:32618

Problem Description

I have a script which takes in either NetCDF or GeoTIFF files as an input and I've been using xr.open_rasterio to read them quite successfully before, as I'm basically just interested in parsing the correct XY coordinates out. However, upgrading from NetCDF 1.4.1 to 1.4.2 breaks this behaviour (xr.open_rasterio no longer parses the coordinates properly from a .nc file).

My hunch is that it has something to do with different NetCDF4 formats (e.g. ‘NETCDF4’, 'NETCDF4_CLASSIC’, ‘NETCDF3_64BIT’) but looking at the xr.open_rasterio code, I'm not too sure what's going on... Also not very sure if this is an upstream netcdf4-python or rasterio issue but I thought I'd report it here first.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.127+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.8.18 libnetcdf: 4.4.1.1 xarray: 0.12.3 pandas: 0.25.0 numpy: 1.17.0rc2 scipy: 1.3.0 netCDF4: 1.4.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.24 cfgrib: None iris: None bottleneck: None dask: 2.1.0 distributed: None matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.0.1 pip: 19.2.1 conda: None pytest: 5.0.1 IPython: 7.6.1 sphinx: None
jhamman commented 5 years ago

Thanks @weiji14 for opening this issue. When we implemented open_rasterio, I don't think we intended this to be used for netCDF data so I'm surprised it works (worked) at all. I think we'll need someone to dig into the xarray implementation a bit to see if we can pull out the correct coordinates (metadata) using rasterio.

weiji14 commented 5 years ago

Well open_rasterio did have an "experimental" warning on it in the docs :laughing:, but it was really nice having it work on GeoTIFFs and NetCDF files. I've forked the repo and will try to debug the situation a bit more. If anyone who's worked on that part of the codebase before has any pointers on what might be the cause of this issue / where to start that would be great.

weiji14 commented 5 years ago

Hold on, the coordinates seems to be parsed out correctly from the netCDF file (even with netCDF==1.5.1.2) when I have a clean conda installation created following the instructions at https://xarray.pydata.org/en/latest/contributing.html#creating-a-python-environment.

I've isolated the issue and think the problem arises when I also import salem (an xarray accessor)... Will try to narrow this down before I close this issue.

shoyer commented 5 years ago

I guess GDAL can also read netCDF files :).

Indeed, we did not anticipate people using open_rasterio this way.

jhamman commented 5 years ago

GDAL can open so many formats (https://gdal.org/drivers/raster/index.html). We've really only tested against TIFF like formats though. I think the assumption will need to be is that rasterio will provide standard metadata for each of these drivers.

weiji14 commented 5 years ago

Yes, there's https://gdal.org/drivers/raster/netcdf.html :smile: I've done a bit more debugging (having temporarily isolated salem from my script) and am still having issues with my setup.

The clean xarray-tests conda environment that works with netcdf==1.5.1.2 has libnetcdf: 4.6.2, but for some strange reason, running xr.show_versions() on my setup shows libnetcdf: 4.6.3 even though conda list | grep libnetcdf shows that I've installed libnetcdf 4.6.2 h056eaf5_1002 conda-forge.

Not sure if this libnetcdf 4.6.3 version is the problem, but it stands out the most (to me at least) when looking at the diff between my setup and the clean one. Is there a way to check the order in which xarray looks for the netcdf binaries as I feel it might be a PATH related issue. Also not sure if this issue fits here in xarray or somewhere else now...

shoyer commented 5 years ago

The clean xarray-tests conda environment that works with netcdf==1.5.1.2 has libnetcdf: 4.6.2, but for some strange reason, running xr.show_versions() on my setup shows libnetcdf: 4.6.3 even though conda list | grep libnetcdf shows that I've installed libnetcdf 4.6.2 h056eaf5_1002 conda-forge.

xr.show_versions() gets its libnetcdf version from netCDF4 (specially netCDF4.__netcdf4libversion__). So I'm guessing that somehow netCDF4 is picking up libnetcdf from somewhere else -- maybe you pip installed it? It might be worth trying another fresh conda environment...

weiji14 commented 5 years ago

xr.show_versions() gets its libnetcdf version from netCDF4 (specially netCDF4.__netcdf4libversion__). So I'm guessing that somehow netCDF4 is picking up libnetcdf from somewhere else -- maybe you pip installed it? It might be worth trying another fresh conda environment...

I'm not sure where it's picking up the libnetcdf 4.6.3 version from, but I found your comment at https://github.com/pydata/xarray/issues/2535#issuecomment-445944261 and think it might indeed be an incompatibility issue with rasterio and netCDF4 binary wheels (do rasterio wheels include netcdf binaries?). Probably somewhat related to https://github.com/mapbox/rasterio/issues/1574 too.

Managed to get things to work by combining the workaround in this Pull Request and StackOverflow post, basically having pip compile the netcdf python package from source instead of using the wheel:

HDF5_DIR=$CONDA_PREFIX pip install --no-binary netCDF4 netCDF4==1.4.2

where $CONDA_PREFIX is the path to the conda environment e.g. /home/jovyan/.conda/envs/name-of-env. I've tested my MCVE code sample above and it works up to the latest netCDF4==1.5.1.2 version!

naught101 commented 4 years ago

I'm seeing a problem with geotiffs that I think might be related:

$ gdalinfo /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif
Driver: GTiff/GeoTIFF
Files: /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif
       /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif.aux.xml
Size is 10, 10
Coordinate System is:
PROJCS["GDA94 / MGA zone 55",
    GEOGCS["GDA94",
        DATUM["Geocentric_Datum_of_Australia_1994",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6283"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4283"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",147],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","28355"]]
Origin = (-406507.954209543997422,7588834.152862589806318)
Pixel Size = (263.500000000000000,-265.500000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( -406507.954, 7588834.153) (138d16' 6.99"E, 21d34'24.95"S)
Lower Left  ( -406507.954, 7586179.153) (138d16' 1.84"E, 21d35'50.30"S)
Upper Right ( -403872.954, 7588834.153) (138d17'37.56"E, 21d34'29.73"S)
Lower Right ( -403872.954, 7586179.153) (138d17'32.42"E, 21d35'55.09"S)
Center      ( -405190.454, 7587506.653) (138d16'49.70"E, 21d35'10.02"S)
Band 1 Block=10x10 Type=Float32, ColorInterp=Gray
  Min=0.069 Max=8.066 
  Minimum=0.069, Maximum=8.066, Mean=2.556, StdDev=1.749
  NoData Value=-3.40282306073709653e+38
  Metadata:
    STATISTICS_MAXIMUM=8.06591796875
    STATISTICS_MEAN=2.5563781395387
    STATISTICS_MINIMUM=0.068740844726562
    STATISTICS_STDDEV=1.7493082797107
    STATISTICS_VALID_PERCENT=89
import xarray as xr
from osgeo.gdal import Open

ras = Open('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif')
ras.GetGeoTransform()
# (-406507.954209544, 263.5, 0.0, 7588834.15286259, 0.0, -265.5)

ds = xr.open_rasterio('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif')
ds.transform
# (263.5, 0.0, -406507.954209544, 0.0, -265.5, 7588834.15286259)

The transform in the xarray dataset is transposed, and not really useful anymore.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-37-lowlatency machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.14.1 pandas: 0.25.1 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.22 cfgrib: None iris: None bottleneck: None dask: 1.2.0 distributed: 1.27.1 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.3.0 sphinx: 2.1.2
kmuehlbauer commented 4 years ago

@naught101 Looks like those coordinates are now in the order as declared in the affine package (since rasterio >= 1.0). See also: https://rasterio.readthedocs.io/en/latest/topics/migrating-to-v1.html.

All coordinates are created correctly, so from my point of view, only the docs here have to be changed.

from affine import Affine
da = xr.open_rasterio('path_to_file.tif')
transform = Affine(*da.attrs['transform'])
nx, ny = da.sizes['x'], da.sizes['y']
x, y = np.meshgrid(np.arange(nx)+0.5, np.arange(ny)+0.5) * transform
gabriel-abrahao commented 3 years ago

Maybe this can be closed now, since #5021 changed the docs?

@naught101 Looks like those coordinates are now in the order as declared in the affine package (since rasterio >= 1.0). See also: https://rasterio.readthedocs.io/en/latest/topics/migrating-to-v1.html.

All coordinates are created correctly, so from my point of view, only the docs here have to be changed.

from affine import Affine
da = xr.open_rasterio('path_to_file.tif')
transform = Affine(*da.attrs['transform'])
nx, ny = da.sizes['x'], da.sizes['y']
x, y = np.meshgrid(np.arange(nx)+0.5, np.arange(ny)+0.5) * transform
keewis commented 3 years ago

yes, I would think so. Thanks for the reminder, @gabriel-abrahao.

The original issue was that libnetcdf picked up by netCDF4 didn't match the version installed by the netCDF4 wheels, which is not really xarray's fault.