Open max-sixty opened 9 years ago
corr
on a DataFrame works without another DataFrame? (as it computes the correlation of the combinations of its columns):
In [4]: df = pd.DataFrame(np.random.randn(10,3))
In [6]: df.corr()
Out[6]:
0 1 2
0 1.000000 0.116443 0.127691
1 0.116443 1.000000 0.472557
2 0.127691 0.472557 1.000000
you would have to change the signature of .corr
to something like:
def corr(self, other=None, method='pearson', min_periods=1, axis=0, drop=False):
if other
is None
then it becomes self
.
with a Series
is tricker because then you need to know how to broadcast it, e.g. row-wise or column-wise (usually you mean this), though I think we could simply use the axis
arg for this
With the changes to rolling()
, now .corr()
is incongruent between the rolling
& normal implementation:
# df.corr(series) works with rolling
In [3]: pd.DataFrame(pd.np.random.rand(10,3)).rolling(window=3).corr(pd.Series(p
...: d.np.random.rand(10)))
Out[3]:
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 -0.673346 0.020557 -0.907277
3 -0.751201 0.589850 -0.956764
4 -0.744613 0.858481 -0.935376
5 -0.880597 0.611522 -0.990112
6 -0.968260 -0.530005 -0.095204
7 -0.241248 0.684507 -0.112472
8 -0.007827 0.769953 -0.845051
9 -0.341660 0.995147 -0.994606
# .corr(series) doesn't work without `rolling`:
In [4]: pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(1
...: 0)))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.py:716: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = getattr(x, name)(y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-05b6520eb259> in <module>()
----> 1 pd.DataFrame(pd.np.random.rand(10,3)).corr(pd.Series(pd.np.random.rand(10)))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in corr(self, method, min_periods)
4553 mat = numeric_df.values
4554
-> 4555 if method == 'pearson':
4556 correl = _algos.nancorr(com._ensure_float64(mat), minp=min_periods)
4557 elif method == 'spearman':
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis)
761 other = np.asarray(other)
762
--> 763 res = na_op(values, other)
764 if isscalar(res):
765 raise TypeError('Could not compare %s type with Series' %
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.pyc in na_op(x, y)
716 result = getattr(x, name)(y)
717 if result is NotImplemented:
--> 718 raise TypeError("invalid type comparison")
719 except AttributeError:
720 result = op(x, y)
TypeError: invalid type comparison
Currently:
corr
on aDataFrame
requires anotherDataFrame
, and fails on aSeries
corrwith
on aDataFrame
takes aSeries
Is there a good reason these are separate? Should
corr
do whatevercorrwith
does when passed aSeries
, andcorrwith
could be deprecated?