Closed avi-analytics closed 7 years ago
Hi @avi-analytics - thanks so much for the PR!
It sounds like your workflow includes calling calmar_ratio
with 2D values? Is that via pyfolio/zipline, or standalone usage? And would you mind just explaining the expected semantics for the calmar ratio or max drawdown of multiple returns streams?
In general, these stats functions do attempt to implement the type interfaces specified in their docstrings. As you noted in #43: "max_drawdown promises to return a float scalar in it's documentation; however in reality it returns an ndarray/Series object with same dimensionality as the input." This is true; however, it's also assuming callers fulfill their promise of providing max_drawdown
with a Series
or ndarray
of returns for a strategy. In the case of ndarray
, I realize that the shape isn't mentioned, but it's meant a shape like that of a Series
, i.e. a 1D ndarray
(at least so far!).
cum_returns
is an example of an empyrical stat that accepts 1D or 2D input. When the input is 1D, the return value is a scalar, and when it's 2D, the output is 1D - essentially the result of running the aggregation on each return stream independently. Given multiple returns streams, your change here looks to return a single max drawdown or calmar ratio. What do you think about making this more consistent with cum_returns
by returning one value per input stream, and letting the caller decide how to aggregate across returns streams? Since max_drawdown
is a basic building block for many other functions, I'd love to decrease the aggregation assumptions it makes.
The test cases for max_drawdown
and calmar_ratio
all currently assume the 1D input interface, while cum_returns
is tested for 2D as well. If 2D support is what you need, I don't want anyone to break this functionality on you going forward. Would you add tests and update the docstrings to expect DataFrame
input and appropriate output?
I see that annual_return
is used by calmar_ratio
as well, but also advertises only 1D support - do you know if it happens to work well with 2D data already?
Thanks again!
Hi @richafrank , many thanks for a nice review.
No I'm not using 2D values. On my system even test_stats.py is failing. Have a look at the attached file.
What is worse, it seems hard to recreate, subtle bug. I failed to recreate it on travis, even with similar set up. Problem is that max_drawdown
is passing a Series
object to np.nanmin
, which in turn is calling np.fmin.reduce
to do actual aggregation. On my system (numpy 1.12.1; pandas 0.19.2), np.fmin.reduce
is returning a Series
object which is being cascaded upward. I couldn't test beyond that but the problem seems to be related to following pandas warning:
Warning In 0.13.0 since Series has internaly been refactored to no longer sub-class ndarray but instead subclass NDFrame, you can not pass a Series directly as a ndarray typed parameter to a cython function. Instead pass the actual ndarray using the .values attribute of the Series.
So the issue is that pandas API no longer guarantees symmetric treatment of Series and 1-d numpy arrays (unless explicitly done so by passing Series.values); it depends on the implementation. And on my system at least it is returning a Series object, not float scalar, even when input is 1d.
To make the code more robust, two approaches are possible:
np.nanmin
and explicitly pass Series.values
.np.nanmin
and handle the cases.Both approaches work fine (I've tested both), though my PR is based on second.
Oh, wow, my assumption was totally off! I'll see if I can reproduce it...
I'm having trouble reproducing it with 1D data. Any ideas?
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import empyrical as emp
In [4]: pd.__version__
Out[4]: '0.19.2'
In [5]: np.__version__
Out[5]: '1.12.1'
In [6]: emp.__version__
Out[6]: '0.2.2'
In [7]: emp.calmar_ratio(np.array([0.1, -0.2, 0.3, -0.1]))
Out[7]: 26.411425065382151
In [8]: emp.calmar_ratio(pd.Series([0.1, -0.2, 0.3, -0.1]))
Out[8]: 26.411425065382151
In [9]: emp.calmar_ratio(pd.DataFrame(data=[[0.1, -0.2], [0.3, -0.1]]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-0825458a228a> in <module>()
----> 1 emp.calmar_ratio(pd.DataFrame(data=[[0.1, -0.2], [0.3, -0.1]]))
/Users/rich/.virtualenvs/zp35/lib/python3.5/site-packages/empyrical/stats.py in calmar_ratio(returns, period, annualization)
372 return np.nan
373
--> 374 if np.isinf(temp):
375 return np.nan
376
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Would it be worth making a new environment with zipline's requirements.txt installed to see if that works?
Hi @richafrank
It is hard to replicate. It depends on the way numpy
functions interact with pandas.Series
; other modules are irrelevant. Contrast the behavior of np.nanmin
with different versions of pandas.Series
. Point is that you can not make any assumption about the return value of np.nanmin
if your input is Series. Best practice would be to take pandas warning seriously and pass Series.values
to np.nanmin
. If you want, I can do a PR using pandas approach.
Best
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import empyrical as emp
In [4]: pd.__version__
Out[4]: '0.19.2'
In [5]: np.__version__
Out[5]: '1.12.1'
In [6]: emp.__version__
Out[6]: '0.2.2'
In [7]: data = [0.1, -0.2, 0.3, -0.1]
In [8]: emp.calmar_ratio(np.array(data))
Out[8]: 26.411425065382151
In [9]: emp.calmar_ratio(pd.Series(data))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-df3492acd200> in <module>()
----> 1 emp.calmar_ratio(pd.Series(data))
/media/avinash/infotainment/projects/github/mypyfolio/lib/python3.5/site-packages/empyrical/stats.py in calmar_ratio(returns, period, annualization)
363
364 max_dd = max_drawdown(returns=returns)
--> 365 if max_dd < 0:
366 temp = annual_return(
367 returns=returns,
/media/avinash/infotainment/projects/github/mypyfolio/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
915 raise ValueError("The truth value of a {0} is ambiguous. "
916 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 917 .format(self.__class__.__name__))
918
919 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In [10]: emp.calmar_ratio(pd.Series(data).values)
Out[10]: 26.411425065382151
In [11]: np.nanmin(pd.Series(data))
Out[11]:
0 -0.2
1 -0.2
2 -0.2
3 -0.2
dtype: float6
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.__version__
Out[3]: '0.12.0'
In [4]: np.__version__
Out[4]: '1.12.1'
In [5]: data = [0.1, -0.2, 0.3, -0.1]
In [6]: np.nanmin(pd.Series(data))
Out[6]: -0.20000000000000001
Ok, I believe I've figured out why I wasn't seeing your behavior. I've got bottleneck installed, whose nanmin
implementation empyrical will use if available. It doesn't suffer from the nanmin-Series issue that I think you're seeing and returns a scalar here.
You cracked it! Turns out I'd cloned empyrical and linked it to a virtual environment manually. It is working as expected after installing bottleneck. Closing the PR.
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: import empyrical as emp
In [4]: pd.__version__
Out[4]: '0.19.2'
In [5]: np.__version__
Out[5]: '1.12.1'
In [6]: data = [0.1, -0.2, 0.3, -0.1]
In [7]: emp.calmar_ratio(pd.Series(data))
Out[7]: 26.411425065382151
Awesome!
To handle the case #43 where object to be returned from
max_drawdown
may be Series/ndarray, invokedmin
on it. If it is already a float scalar thenAttributeError
will be raised and appropriately handled.