This PR broke open_mfdataset with remote files. The _normalize_path_list doesn't identify them properly and recurses into the remote file
What did you expect to happen?
This should continue to work, i.e. exit if p is not a list instead of recursing.
Minimal Complete Verifiable Example
from distributed import Client
import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem()
file_list = ['s3://nex-gddp-cmip6/NEX-GDDP-CMIP6/ACCESS-CM2/historical/r1i1p1f1/hurs/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc']
files = [s3.open(f) for f in file_list]
cc @headtr1ck @dcherian
if __name__ == "__main__":
client = Client()
# Load input NetCDF data files
# TODO: Reduce explicit settings once https://github.com/pydata/xarray/issues/8778 is completed.
ds = xr.open_mfdataset(
files,
engine="h5netcdf",
combine="nested",
concat_dim="time",
data_vars="minimal",
coords="minimal",
compat="override",
parallel=True,
)
MVCE confirmation
[x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
[x] Complete example — the example is self-contained, including all data and the text of any traceback.
[x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
[x] New issue — a search of GitHub Issues suggests this is not a duplicate.
[x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Traceback (most recent call last):
File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2024.3/scratches/scratch.py", line 19, in <module>
ds = xr.open_mfdataset(
^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/api.py", line 1539, in open_mfdataset
paths = _find_absolute_paths(paths, engine=engine, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 149, in _find_absolute_paths
return _normalize_path_list(paths)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
return [
^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
else _normalize_path_list(p)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
return [
^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
else _normalize_path_list(p)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
return [
^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 144, in <listcomp>
else _normalize_path_list(p)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/patrick/mambaforge/envs/dask-dev/lib/python3.11/site-packages/xarray/backends/common.py", line 140, in _normalize_path_list
return [
^
TypeError: 'int' object is not iterable
What happened?
https://github.com/pydata/xarray/pull/9687
This PR broke open_mfdataset with remote files. The
_normalize_path_list
doesn't identify them properly and recurses into the remote fileWhat did you expect to happen?
This should continue to work, i.e. exit if p is not a list instead of recursing.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment