Closed machow closed 1 year ago
For a grouped summarize, when a grouping column...
AFAICT setting groupby(..., dropna=False) resolves this (cf https://github.com/machow/siuba/issues/251)
groupby(..., dropna=False)
cars6 = cars.copy() cars6["cyl"] = np.nan cars6 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean())
Raises
ValueError: cannot insert cyl, already exists
cars5 = cars.copy() cars5["cyl"] = [1] + [np.nan] * (len(cars) - 1) cars5 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean())
Output
Note there's no cyl or hp column on the result
Addressed in v0.4.2
For a grouped summarize, when a grouping column...
AFAICT setting
groupby(..., dropna=False)
resolves this (cf https://github.com/machow/siuba/issues/251)Example: all NA levels raises an error, since grouping columns on result and index
Raises
Full traceback
```python ValueError Traceback (most recent call last) Cell In [23], line 4 1 cars6 = cars.copy() 2 cars6["cyl"] = np.nan ----> 4 cars6 >> group_by(_.cyl, _.hp) >> summarize(res = _.mpg.mean()) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/siuba/siu/calls.py:214, in Call.__rrshift__(self, x) 210 if isinstance(strip_symbolic(x), (Call)): 211 # only allow non-calls (i.e. data) on the left. 212 raise TypeError() --> 214 return self(x) File ~/.virtualenvs/siuba/lib/python3.8/site-packages/siuba/siu/calls.py:189, in Call.__call__(self, x) 187 return operator.getitem(inst, *rest) 188 elif self.func == "__call__": --> 189 return getattr(inst, self.func)(*rest, **kwargs) 191 # in normal case, get method to call, and then call it 192 f_op = getattr(operator, self.func) File ~/.pyenv/versions/3.8.12/lib/python3.8/functools.py:875, in singledispatch.Example: 1 non NA level outputs a table w/o grouping columns
Output
Note there's no cyl or hp column on the result