pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

[Feature Request] iteration equivalent numpy's nditer or ndenumerate #2805

Open AdrianSosic opened 5 years ago

AdrianSosic commented 5 years ago

Hi folks, is there any simple way to iterate over all elements of an xarray together with its labels? What I am looking for is an equivalent to numpy's nditer or rather ndenumerate method:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndenumerate.html

Ideally, the iterator should return both the current data element and its coordinates, potentially in the form of an (ordered) dictionary. Is there any direct possibility to achieve this with the current functionality of the package?

shoyer commented 5 years ago

You could convert your data into pandas and use .itertuples(), e.g.,

import xarray
import itertools

ds = xarray.tutorial.open_dataset('air_temperature')
records = ds.to_dataframe().reset_index().itertuples(index=False, name='Record')
print(list(itertools.islice(records, 5)))

Outputs:

[Record(lat=75.0, lon=200.0, time=Timestamp('2013-01-01 00:00:00'), air=241.1999969482422),
 Record(lat=75.0, lon=200.0, time=Timestamp('2013-01-01 06:00:00'), air=242.09999084472656),
 Record(lat=75.0, lon=200.0, time=Timestamp('2013-01-01 12:00:00'), air=242.29998779296875),
 Record(lat=75.0, lon=200.0, time=Timestamp('2013-01-01 18:00:00'), air=241.88999938964844),
 Record(lat=75.0, lon=200.0, time=Timestamp('2013-01-02 00:00:00'), air=243.1999969482422)]
AdrianSosic commented 5 years ago

Hi shoyer, many thanks for your quick reply. Converting the xarray to a DataFrame indeed does the job and I will use this solution for the time being. Nevertheless, to me the approach seems rather like an ad-hoc solution since it requires a series of conversions / function calls and I feel like there should be some built-in solution from xarray. In particular, in the above solution, you lose track of what are the coordinates and what is the actual data (all is stored in a single NamedTuple), which requires an additional step to separate the two data structures. Anyway, thanks for your help! If you find another solution, please let me know!

lanougue commented 2 years ago

Hi guys,

For now, when I want to iterate over all my dataset I use the simple (but dangerous I believe) workaround:

for i in np.ndindex(ds.shape):
        res = ds[{d:j for d,j in zip(ds.dims,i)}]

but, I am not sure that ndindex will iterate in the good order relatively to the ds.dims return.

Is there any news on this topic ?

Many thanks !

lanougue commented 2 years ago

Hello guys,

While waiting for a integrated solution. Here is a function that should do the job in a safe way. It returns an iterator

def xndindex(ds, dims=None):
    if dims is None:
        dims = ds.dims
    elif type(dims) is str:
        dims=[dims]
    else:
        pass

    for d in dims:
        if d not in ds.dims:
            raise ValueError("Invalid dimension '{}'. Available dimensions {}".format(d, ds.dims))

    iter_dict = {k:v for k,v in ds.sizes.items() if k in dims}
    for d,k in zip(repeat(tuple(iter_dict.keys())),zip(np.ndindex(tuple(iter_dict.values())))):
        yield {k:l for k,l in zip(d,k[0])}

Example of use

a = xr.DataArray(np.random.rand(4,3), dims=['x','y'], coords={'x':np.arange(4), 'y':np.arange(3)})
for i in xndindex(a):
    print(a[i])