pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

open_mfdataset with remote files is broken because of #9687 #9784

Closed phofl closed 6 days ago

phofl commented 6 days ago

What happened?

https://github.com/pydata/xarray/pull/9687

This PR broke open_mfdataset with remote files. The _normalize_path_list doesn't identify them properly and recurses into the remote file

What did you expect to happen?

This should continue to work, i.e. exit if p is not a list instead of recursing.

Minimal Complete Verifiable Example

from distributed import Client

import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem()

file_list = ['s3://nex-gddp-cmip6/NEX-GDDP-CMIP6/ACCESS-CM2/historical/r1i1p1f1/hurs/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc']
files = [s3.open(f) for f in file_list]

cc @headtr1ck @dcherian 

if __name__ == "__main__":
    client = Client()
    # Load input NetCDF data files
    # TODO: Reduce explicit settings once https://github.com/pydata/xarray/issues/8778 is completed.
    ds = xr.open_mfdataset(
        files,
        engine="h5netcdf",
        combine="nested",
        concat_dim="time",
        data_vars="minimal",
        coords="minimal",
        compat="override",
        parallel=True,
    )

MVCE confirmation

Relevant log output

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch.py", line 19, in <module>
    ds = xr.open_mfdataset(
         ^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/api.py", line 1539, in open_mfdataset
    paths = _find_absolute_paths(paths, engine=engine, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 149, in _find_absolute_paths
    return _normalize_path_list(paths)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
    else _normalize_path_list(p)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
    return [
           ^
TypeError: 'int' object is not iterable

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:26:25) [Clang 17.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.3 libnetcdf: None xarray: 2024.10.1.dev51+g864b35a1 pandas: 2.2.3 numpy: 2.0.2 scipy: 1.14.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.12.1 zarr: 2.18.3 cftime: None nc_time_axis: None iris: None bottleneck: 1.4.2 dask: 2024.11.2+23.g709bad03e distributed: 2024.11.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2024.10.0 cupy: None pint: None sparse: 0.15.4 flox: None numpy_groupies: None setuptools: 75.3.0 pip: 24.3.1 conda: None pytest: 8.3.3 mypy: None IPython: 8.29.0 sphinx: None None