Open vfilimonov opened 4 years ago
and how exactly does numpy accept min as an arg?
Hello, @jreback
numpy's clip
operates with min
and max
instead of lower
and upper
:
x = np.random.randn(100)
x.clip(min=0)
Perhaps pandas' **kwargs
are not meant for this purpose - but it is not clear from the docs, what exactly does "compatibility with numpy" means.
The way how I stumbled across it - is trying to have some function to behave identically for pandas' and numpy's input. I ended up having the following ugly workaround:
def clip(x, lower=None, upper=None):
""" Unified interface for pandas' and numpy's clip """
try:
return x.clip(lower=lower, upper=upper)
except ValueError:
if lower is None and upper is None:
return x
return x.clip(min=lower, max=upper)
show me numpy docs i couldn't see this parameter
It's here: numpy.ndarray.clip
But eventually numpy itself is not consistent and numpy.clip operates with a_min
and a_max
instead.
right that's what i saw a_max/a_min
so i think of you use that it would work
It looks like a_min
/a_max
are also rejected by pandas:
x = pd.Series(np.random.randn(100))
x.clip(a_min=0) # TypeError: clip() got an unexpected keyword argument 'a_min'
x.clip(a_max=0) # TypeError: clip() got an unexpected keyword argument 'a_max'
If fixing this and allowing pandas to accept numpy's argument in clip
I would vote to keep the min
/max
arguments of numpy.ndarray.clip()
rather than a_min
and a_max
of numpy.clip()
. This way it will keep the polymorphic behavior as both of pandas.Series/DataFrame and numpy.ndarray could then accept the same arguments:
x_np = np.random.randn(100)
x_pd = pd.Series(x_np)
x_np.clip(min=0)
x_pd.clip(min=0)
Another option is indeed to remove the **kwargs from the docs and do not support such compatibility, since the parameters are ambiguous for numpy.
If fixing this and allowing pandas to accept numpy's argument in
clip
I would vote to keep themin
/max
arguments ofnumpy.ndarray.clip()
rather thana_min
anda_max
ofnumpy.clip()
.
it looks like that when the np.clip function dispatches to Series.clip, the arguments are passed as positional arguments and hence a_min/a_max do not get passed on to Series.clip
np.clip(ser, a_min=1, a_max=8)
> c:\users\simon\pandas\pandas\core\generic.py(7316)clip()
-> inplace = validate_bool_kwarg(inplace, "inplace")
(Pdb) a
self = 0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
dtype: int32
lower = 1
upper = 8
axis = None
inplace = False
args = ()
kwargs = {'out': None}
and hence the following works as expected.
>>> pd.__version__
'1.1.0rc0'
>>>
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> ser = pd.Series(arr)
>>> ser
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
dtype: int32
>>>
>>> np.clip(ser, a_min=1, a_max=8)
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 8
dtype: int32
>>>
>>> np.clip(ser, 1, 8)
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 8
dtype: int32
>>>
>>> np.clip(a_max=8, a_min=1, a=ser)
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 8
dtype: int32
>>>
w.r.t the min/max keywords of ndarray.clip, unlike np.clip, these arguments are optional, however at least one of max or min must be given.
I think that there could be an argument for deprecating the current keywords in Series.clip, upper/lower in favour of min/max.
we have a deprecate_kwarg
decorator specifically for this purpose.
further investigation and PRs welcome.
The way how I stumbled across it - is trying to have some function to behave identically for pandas' and numpy's input. I ended up having the following ugly workaround:
does passing the arguments as positional arguments work? ndarray.clip and Series.clip have the first two arguments in the same order.
or use np.clip.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Documentation of
clip
(dataframe, series) states the method signature:and explains about
**kwargs
:However
min
ormax
arguments of numpy are not accepted and raisedTypeError
.Output of
pd.show_versions()