CLN: Integrate .corrwith and .corr

max-sixty commented 9 years ago

Currently:

corr on a DataFrame requires another DataFrame, and fails on a Series
corrwith on a DataFrame takes a Series

Is there a good reason these are separate? Should corr do whatever corrwith does when passed a Series, and corrwith could be deprecated?

jorisvandenbossche commented 9 years ago

corr on a DataFrame works without another DataFrame? (as it computes the correlation of the combinations of its columns):

In [4]: df = pd.DataFrame(np.random.randn(10,3))

In [6]: df.corr()
Out[6]:
          0         1         2
0  1.000000  0.116443  0.127691
1  0.116443  1.000000  0.472557
2  0.127691  0.472557  1.000000

jreback commented 9 years ago

you would have to change the signature of .corr to something like:

def corr(self, other=None, method='pearson', min_periods=1, axis=0, drop=False):

if other is None then it becomes self.

with a Series is tricker because then you need to know how to broadcast it, e.g. row-wise or column-wise (usually you mean this), though I think we could simply use the axis arg for this

max-sixty commented 8 years ago

With the changes to rolling(), now .corr() is incongruent between the rolling & normal implementation:


# df.corr(series) works with rolling

In [3]: pd.DataFrame(pd.np.random.rand(10,3)).rolling(window=3).corr(pd.Series(p
   ...: d.np.random.rand(10)))
Out[3]: 
          0         1         2
0       NaN       NaN       NaN
1       NaN       NaN       NaN
2 -0.673346  0.020557 -0.907277
3 -0.751201  0.589850 -0.956764
4 -0.744613  0.858481 -0.935376
5 -0.880597  0.611522 -0.990112
6 -0.968260 -0.530005 -0.095204
7 -0.241248  0.684507 -0.112472
8 -0.007827  0.769953 -0.845051
9 -0.341660  0.995147 -0.994606

# .corr(series) doesn't work without `rolling`:

In [4]: pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(1
   ...: 0)))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.py:716: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  result = getattr(x, name)(y)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-05b6520eb259> in <module>()
----> 1 pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(10)))

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in corr(self, method, min_periods)
   4553         mat = numeric_df.values
   4554 
-> 4555         if method == 'pearson':
   4556             correl = _algos.nancorr(com._ensure_float64(mat), minp=min_periods)
   4557         elif method == 'spearman':

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis)
    761                 other = np.asarray(other)
    762 
--> 763             res = na_op(values, other)
    764             if isscalar(res):
    765                 raise TypeError('Could not compare %s type with Series' %

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in na_op(x, y)
    716                 result = getattr(x, name)(y)
    717                 if result is NotImplemented:
--> 718                     raise TypeError("invalid type comparison")
    719             except AttributeError:
    720                 result = op(x, y)

TypeError: invalid type comparison

pandas-dev / pandas

CLN: Integrate .corrwith and .corr #11260