pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.31k stars 17.8k forks source link

Internals .apply Methods #17124

Closed jbrockmendel closed 7 years ago

jbrockmendel commented 7 years ago

core.internals.BlockManager.isna has what appears to be a typo:

    def isna(self, **kwargs):
        return self.apply('apply', **kwargs)

The pattern in all of the other methods like this would be self.apply('isna', **kwargs). But in trying to apply what looks like a simple fix, other breakages show up:

df = pd.DataFrame([[0, np.nan],[ np.nan, 1]], columns=list('AB')) 
mgr = df._data

>>> mgr.diff()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3209, in diff
    return self.apply('diff', **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3091, in apply
    applied = getattr(b, f)(**kwargs)
TypeError: diff() takes at least 2 arguments (2 given)

>>> mgr.diff(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: diff() takes exactly 1 argument (2 given)

>>> mgr.diff(axis=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3209, in diff
    return self.apply('diff', **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3091, in apply
    applied = getattr(b, f)(**kwargs)

>>> mgr.quantile()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3200, in quantile
    return self.reduction('quantile', **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3129, in reduction
    axe, block = getattr(b, f)(axis=axis, **kwargs)
TypeError: quantile() takes at least 2 arguments (3 given)
>>> mgr.quantile(.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: quantile() takes exactly 1 argument (2 given)

I don't see tests for these methods in tests.test_internals, and it isn't obvious if/where they are actually used. Trying to track down coverage stats now (the codecov badge on the github page doesn't link to codecov. Could've sworn that used to work...)

jbrockmendel commented 7 years ago

Looks like these are all covered.

jorisvandenbossche commented 7 years ago

The reason for this is that the block managers isna is called like obj._data.isna(func=isna) or obj._data.isna(func=_isna_old). So it passes the function to use, and hence uses 'apply' under the hood in internals.py. See https://github.com/pandas-dev/pandas/blob/f2b0bdc9bc4e57e101e306db7555eb7db28172e9/pandas/core/dtypes/missing.py#L60

I agree it is a bit awkward (certainly to understand), but so not a typo.

BTW, the reason you get an error with the diff method is because you have to pass the argument as keyword arg: mgr.diff(n=1) works. Again, not the most user friendly, and the error messages are also strange, but this are internals that the user should never touch (what does not mean that can be more friendly to contributors!)