pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

groupby trigging StopIteration: error when ran in loop #2240

Closed lgpreston closed 4 years ago

lgpreston commented 6 years ago

First github issue I've raised so apologies if it doesn't follow protocol.

I'm receiving a StopIteration: error when attempting to use the groupby function in xarray. The error only occurs when attempting to loop through a list of files - if a single file path is input, no error is generated. I've also tried using xr.open_mfdataset to open the full directory of files, but this produced the same error.

for path in in_files:
    ds = xr.open_dataset(path)
    ds['index'] = county_mask
    ds = ds.set_coords('index')
    ds = ds.where(ds['index'].isin(cotton_county_keys))
    ds.groupby('index').mean('stacked_lat_lon').to_dataframe().reset_index()

Produces:

StopIteration                             Traceback (most recent call last)
<ipython-input-91-f26bf31efda5> in <module>()
      6     ds = ds.set_coords('index')
      7     ds = ds.where(ds['index'].isin(cotton_county_keys))
----> 8     ds.groupby('index').mean('stacked_lat_lon').to_dataframe().reset_index()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\common.py in wrapped_func(self, dim, keep_attrs, skipna, **kwargs)
     52                 return self.reduce(func, dim, keep_attrs, skipna=skipna,
     53                                    numeric_only=numeric_only, allow_lazy=True,
---> 54                                    **kwargs)
     55         else:
     56             def wrapped_func(self, dim=None, keep_attrs=False, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\groupby.py in reduce(self, func, dim, keep_attrs, **kwargs)
    652         def reduce_dataset(ds):
    653             return ds.reduce(func, dim, keep_attrs, **kwargs)
--> 654         return self.apply(reduce_dataset)
    655 
    656     def assign(self, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\groupby.py in apply(self, func, **kwargs)
    607         kwargs.pop('shortcut', None)  # ignore shortcut if set (for now)
    608         applied = (func(ds, **kwargs) for ds in self._iter_grouped())
--> 609         return self._combine(applied)
    610 
    611     def _combine(self, applied):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\groupby.py in _combine(self, applied)
    611     def _combine(self, applied):
    612         """Recombine the applied objects like the original."""
--> 613         applied_example, applied = peek_at(applied)
    614         coord, dim, positions = self._infer_concat_args(applied_example)
    615         combined = concat(applied, dim)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\xarray\core\utils.py in peek_at(iterable)
    113     """
    114     gen = iter(iterable)
--> 115     peek = next(gen)
    116     return peek, itertools.chain([peek], gen)
    117 

StopIteration: 

As does:

ds = xr.open_dataset(in_files[0])
ds['index'] = county_mask
ds = ds.set_coords('index')
ds = ds.where(ds['index'].isin(cotton_county_keys))
ds.groupby('index').mean('stacked_lat_lon').to_dataframe().reset_index()

However a file path works perfectly,

path = r'V:\ARL\Weather\Product_Development\US_PRISM_DATA\daily_temp\PRISM_daily_temp_1993-01-08'

ds = xr.open_dataset(path)
ds['index'] = county_mask
ds = ds.set_coords('index')
ds = ds.where(ds['index'].isin(cotton_county_keys))
ds.groupby('index').mean('stacked_lat_lon').to_dataframe().reset_index()
INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None xarray: 0.10.3 pandas: 0.22.0 numpy: 1.13.3 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: None h5py: 2.7.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.3 distributed: 1.19.1 matplotlib: 2.1.0 cartopy: 0.15.1 seaborn: 0.8.0 setuptools: 36.5.0.post20170921 pip: 9.0.1 conda: 4.4.6 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3
shoyer commented 6 years ago

Thanks for the report.

I believe this is the same issue as https://github.com/pydata/xarray/issues/1764

lgpreston commented 6 years ago

@shoyer is there any update on this? I don't quite understand the error so have so far been unable to develop a workaround.

shoyer commented 6 years ago

Sorry, I haven't had time to look into this yet

dcherian commented 4 years ago

I think this has been fixed since groupby discards nans in the grouped variable.

Please reopen with a reproducible example if it has not been fixed.