quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.47k stars 4.71k forks source link

Tutorial buyapple.py algo fails #473

Closed scubamut closed 9 years ago

scubamut commented 9 years ago

Running :

python run_algo.py -f ../zipline/examples/buyapple.py --start 2000-1-1 --end 2012-1-1 --symbols AAPL -o buyapple_out.pickle

results in:

[2015-01-26 13:36] INFO: Performance: Simulated 3018 trading days out of 2823.
[2015-01-26 13:36] INFO: Performance: first open: 2000-01-04 14:31:00+00:00
[2015-01-26 13:36] INFO: Performance: last close: 2011-12-30 21:00:00+00:00
Traceback (most recent call last):
  File "G:\Anaconda3\Scripts\run_algo.py", line 24, in <module>
    run_pipeline(print_algo=True, **parsed)
  File "G:\Anaconda3\lib\site-packages\zipline\utils\cli.py", line 192, in run_pipeline
    perf = algo.run(source)
  File "G:\Anaconda3\lib\site-packages\zipline\algorithm.py", line 423, in run
    for perf in self.gen:
  File "G:\Anaconda3\lib\site-packages\zipline\gens\tradesimulation.py", line 163, in transform
    risk_message = self.algo.perf_tracker.handle_simulation_end()
  File "G:\Anaconda3\lib\site-packages\zipline\finance\performance\tracker.py", line 462, in handle_simulation_end
    benchmark_returns=bms)
  File "G:\Anaconda3\lib\site-packages\zipline\finance\risk\report.py", line 85, in __init__
    self.month_periods = self.periods_in_range(1, start_date, end_date)
  File "G:\Anaconda3\lib\site-packages\zipline\finance\risk\report.py", line 134, in periods_in_range
    benchmark_returns=self.benchmark_returns
  File "G:\Anaconda3\lib\site-packages\zipline\finance\risk\period.py", line 70, in __init__
    self.calculate_metrics()
  File "G:\Anaconda3\lib\site-packages\zipline\finance\risk\period.py", line 127, in calculate_metrics
    self.condition_number, self.eigen_values = self.calculate_beta()
  File "G:\Anaconda3\lib\site-packages\zipline\finance\risk\period.py", line 256, in calculate_beta
    eigen_values = la.eigvals(C)
  File "G:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 888, in eigvals
    _assertFinite(a)
  File "G:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 217, in _assertFinite
    raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs

Problem appears to be with cumulative_benchmark_returns.

Environment: Windows 7 64 bit, Python 3.4, zipline 0.7.0

Kind regards, Dave Gilbert

twiecki commented 9 years ago

Can confirm on linux with Python 2 and zipline master branch.

mdengler commented 9 years ago

Works for me on master branch with linux 64-bit on python 2.7.5:

$ uname -a
Linux hostname 3.17.6-200.fc20.x86_64 #1 SMP Mon Dec 8 15:21:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/fedora-release 
Fedora release 20 (Heisenbug)
$ python --version
Python 2.7.5
$ python -c "import pandas as pd; print(pd.version.version)"
0.15.2
$ python -c "import numpy as np; print(np.version.version)"
1.8.1
$ git show --stat
commit a5eefc7f8cbeef9fcc1f0c68a487c503f4ee919a
Author: Thomas Wiecki <thomas.wiecki@gmail.com>
Date:   Wed Jan 21 18:25:14 2015 +0100

    TST: Add nose-timer to travis.

 .travis.yml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Output:

$ PYTHONPATH=. python scripts/run_algo.py -f zipline/examples/buyapple.py --start 2000-1-1 --end 2012-1-1 --symbols AAPL -o buya
pple_out.pickle
AAPL
#!/usr/bin/env python
#
# Copyright 2014 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from zipline.api import order, record, symbol

def initialize(context):
    pass

def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data[symbol('AAPL')].price)
import matplotlib.pyplot as plt

def analyze(context, perf):
    ax1 = plt.subplot(211)
    perf.portfolio_value.plot(ax=ax1)
    ax2 = plt.subplot(212, sharex=ax1)
    perf.AAPL.plot(ax=ax2)
    plt.gcf().set_size_inches(18, 8)
    plt.show()
data files aren't distributed with source.
Fetching data from Yahoo Finance.
[2015-01-27 16:01] WARNING: Loader: No benchmark data found for date range.
start_date=2015-01-27 00:00:00+00:00, end_date=2015-01-27 16:01:35.754632, url=http://ichart.finance.yahoo.com/table.csv?s=%5EGSPC&a=0&b=27&c=20
15&d=0&e=27&f=2015&g=d
data files aren't distributed with source.
Fetching data from data.treasury.gov
[2015-01-27 16:05] INFO: Performance: Simulated 3019 trading days out of 3019.
[2015-01-27 16:05] INFO: Performance: first open: 2000-01-03 14:31:00+00:00
[2015-01-27 16:05] INFO: Performance: last close: 2011-12-30 21:00:00+00:00
ssanderson commented 9 years ago

@twiecki @mdengler @scubamut what are your pandas and numpy versions? Good odds that's the culprit if it's passing only for some people.

mdengler commented 9 years ago
$ python -c "import pandas as pd; print(pd.version.version)"
0.15.2
$ python -c "import numpy as np; print(np.version.version)"
1.8.1
twiecki commented 9 years ago
pdb> u
> /home/wiecki/working/projects/quant/zipline/zipline/finance/risk/period.py(256)

 239         def calculate_beta(self):                                                                                   
 240             """                                                                                                     
 241                                                                                                                     
 242             .. math::                                                                                               
 243                                                                                                                     
 244                 \\beta_a = \\frac{\mathrm{Cov}(r_a,r_p)}{\mathrm{Var}(r_p)}                                         
 245                                                                                                                     
 246             http://en.wikipedia.org/wiki/Beta_(finance)                                                             
 247             """                                                                                                     
 248             # it doesn't make much sense to calculate beta for less than two days,                                  
 249             # so return none.                                                                                       
 250             if len(self.algorithm_returns) < 2:                                                                     
 251                 return 0.0, 0.0, 0.0, 0.0, []                                                                       
 252                                                                                                                     
 253             returns_matrix = np.vstack([self.algorithm_returns,                                                     
 254                                         self.benchmark_returns])                                                    
 255             C = np.cov(returns_matrix, ddof=1)                                                                      
 256  ->         eigen_values = la.eigvals(C)                                                                            
 257             condition_number = max(eigen_values) / min(eigen_values)                                                
 258             algorithm_covariance = C[0][1]                                                                          
 259             benchmark_variance = C[1][1]                                                                            
 260             beta = algorithm_covariance / benchmark_variance                                                        
 261                                                                                                                     
 262             return (                                                                                                
 263                 beta,                                                                                               
 264                 algorithm_covariance,                                                                               
 265                 benchmark_variance,                                                                                 
 266                 condition_number,                                                                                   
 267                 eigen_values                                                                                        
 268             )                                                                                                       
ipdb> returns_matrix
array([[  0.00000000e+00,  -3.00000001e-08,  -3.40000010e-07,
          2.90000107e-07,  -2.10000017e-07,  -7.10000206e-07,
         -9.80000980e-07,   1.89000374e-06,   8.80000079e-07,
          9.29999265e-07,   6.89998813e-07,   2.36999429e-06,
         -9.09995650e-07,  -2.06999199e-06,   2.69999514e-06,
         -1.00999545e-06],
       [ -3.83447176e-02,   1.92222492e-03,   9.55702477e-04,
          2.70903844e-02,              nan,              nan,
                     nan,              nan,              nan,
                     nan,              nan,              nan,
                     nan,              nan,              nan,
                     nan]])
ipdb> self.benchmark_returns
2000-01-04 00:00:00+00:00   -0.038345
2000-01-05 00:00:00+00:00    0.001922
2000-01-06 00:00:00+00:00    0.000956
2000-01-07 00:00:00+00:00    0.027090
2000-01-11 00:00:00+00:00         NaN
2000-01-12 00:00:00+00:00         NaN
2000-01-13 00:00:00+00:00         NaN
2000-01-14 00:00:00+00:00         NaN
2000-01-18 00:00:00+00:00         NaN
2000-01-19 00:00:00+00:00         NaN
2000-01-20 00:00:00+00:00         NaN
2000-01-21 00:00:00+00:00         NaN
2000-01-25 00:00:00+00:00         NaN
2000-01-26 00:00:00+00:00         NaN
2000-01-27 00:00:00+00:00         NaN
2000-01-28 00:00:00+00:00         NaN
scubamut commented 9 years ago

pandas 0.14.1 numpy 1.9.0

scubamut commented 9 years ago

This works

[py2.7] G:\Anaconda3-32bit\Scripts>conda list
# packages in environment at G:\Anaconda3-32bit\envs\py2.7:
#
dateutil                  2.1                      py27_2
logbook                   0.6.0                    py27_0
matplotlib                1.4.0                np18py27_0
numpy                     1.8.2                    py27_0
pandas                    0.14.1               np18py27_0
pip                       6.0.6                    py27_0
pyparsing                 2.0.1                    py27_0
pyqt                      4.10.4                   py27_0
pyside                    1.2.1                    py27_0
python                    2.7.9                         1
python-dateutil           1.5                       <pip>
pytz                      2014.9                   py27_0
requests                  2.5.1                    py27_0
scipy                     0.14.0               np18py27_0
setuptools                12.0.5                   py27_0
six                       1.9.0                    py27_0
ta-lib                    0.4.8                np18py27_0
zipline                   0.7.0                np18py27_0

[py2.7] G:\Eclipse Pydev Projects\zipline\scripts>python run_algo.py -f ../zipli
ne/examples/buyapple.py --start 2000-1-1 --end 2012-1-1 --symbols AAPL -o buyapp
le_out.pickle
AAPL
#!/usr/bin/env python
#
# Copyright 2014 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from zipline.api import order, record, symbol

def initialize(context):
    pass

def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data[symbol('AAPL')].price)
import matplotlib.pyplot as plt

def analyze(context, perf):
    ax1 = plt.subplot(211)
    perf.portfolio_value.plot(ax=ax1)
    ax2 = plt.subplot(212, sharex=ax1)
    perf.AAPL.plot(ax=ax2)
    plt.gcf().set_size_inches(18, 8)
    plt.show()

[2015-01-28 06:19] INFO: Performance: Simulated 3019 trading days out of 3019.
[2015-01-28 06:19] INFO: Performance: first open: 2000-01-03 14:31:00+00:00
[2015-01-28 06:19] INFO: Performance: last close: 2011-12-30 21:00:00+00:00

[py2.7] G:\Eclipse Pydev Projects\zipline\scripts>
ssanderson commented 9 years ago

@twiecki I can't repro this locally; I cleared out the cache of data files in ~/.zipline, but everything appears to have re-downloaded successfully.

ssanderson commented 9 years ago

@twiecki for the dates where you're seeing NaNs in the benchmark, what do you have in the corresponding rows of ~/.zipline/data/^GPSC_benchmark.csv ?

mosesmc52 commented 9 years ago

Has anyone figured a fix for this problem? I tried clearing the .zipline cache, but that doesn't solve the problem.

karolestrada commented 9 years ago

I have the same error message when performance stats are being calculated, this error occurs on zipline master branch in Mac OSX 10.9 Python 2.7.7 Pandas 0.14.1 Numpy 1.9.0 / and: Linux x86_64 Python 2.7.1, Pandas 0.15.2 Numpy 1.9.1.

But, the same code actually works on a different Linux x86_64 system with Python 2.7.8 Pandas 0.14.1 Numpy 1.9.0. Strange..

leonth commented 9 years ago

I have the same problem with: numpy 1.8.2 pandas 0.14.1 Python 2.7.9 Linux x86_64 zipline master branch (ffe5a7a171c1673fc24a54728e1640c23e7b5c04)

twiecki commented 9 years ago

I have confirmed that the date is present in the data file, so it's not a caching issue.

twiecki commented 9 years ago

One odd thing I noticed, 2000-1-3 is a Monday, is present in the benchmark data, but our calendar rules in tradingcalendar say it's a non-trading day.

twiecki commented 9 years ago

OK, this indeed seems to be a calendar problem. 2000-1-10 is also market as a non-trading day but is in fact one. So somewhere here https://github.com/quantopian/zipline/blob/master/zipline/utils/tradingcalendar.py#L41 is a bug where trading days are somehow dropped.

Also, seems to preferably drop Mondays.

twiecki commented 9 years ago

I think it's a dateutil bug.

Can someone else confirm by running: pip install -U python-dateutil==2.3.0

and rerunning the example?

twiecki commented 9 years ago

Opened up a github issue: https://github.com/dateutil/dateutil/issues/34

twiecki commented 9 years ago

To catch these cases earlier I added a check: https://github.com/quantopian/zipline/commit/a7188187e65dbb684a93fb3851c852e72ab48d6c

karolestrada commented 9 years ago

Confirmed, I upgraded to python-dateutil==2.3.0 and the issue is solved.

twiecki commented 9 years ago

Thanks. I'm closing this then.

mosesmc52 commented 9 years ago

Thanks. The fix works for me as well.

sashakid commented 9 years ago

Guys, can anyone help me with this issue? I'm a newbie in Python and algotrading but I am very anxious to study this science) I'm using Pycharm + Mac OS and my output is:

Traceback (most recent call last):
  File "/usr/local/bin/run_algo.py", line 24, in <module>
    run_pipeline(print_algo=True, **parsed)
  File "/usr/local/lib/python3.4/site-packages/zipline/utils/cli.py", line 192, in run_pipeline
    perf = algo.run(source)
  File "/usr/local/lib/python3.4/site-packages/zipline/algorithm.py", line 423, in run
    for perf in self.gen:
  File "/usr/local/lib/python3.4/site-packages/zipline/gens/tradesimulation.py", line 163, in transform
    risk_message = self.algo.perf_tracker.handle_simulation_end()
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/performance/tracker.py", line 462, in handle_simulation_end
    benchmark_returns=bms)
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/risk/report.py", line 85, in __init__
    self.month_periods = self.periods_in_range(1, start_date, end_date)
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/risk/report.py", line 134, in periods_in_range
    benchmark_returns=self.benchmark_returns
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/risk/period.py", line 70, in __init__
    self.calculate_metrics()
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/risk/period.py", line 127, in calculate_metrics
    self.condition_number, self.eigen_values = self.calculate_beta()
  File "/usr/local/lib/python3.4/site-packages/zipline/finance/risk/period.py", line 256, in calculate_beta
    eigen_values = la.eigvals(C)
  File "/usr/local/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 888, in eigvals
    _assertFinite(a)
  File "/usr/local/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 217, in _assertFinite
    raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs

Should installation pip install -U python-dateutil==2.3.0 resolve the issue? For me it doesn't work. What I should know about Pycharm and proper way to build Python projects and install different libs?

twiecki commented 9 years ago

@sashakid Can you also install zipline master where I added a check that should catch this behavior earlier?

sashakid commented 9 years ago

@twiecki I installed zipline like it's said in your manual with pip install zipline but I see that I use the old version of zipline (0.7) in Pycharm, am I right? Screenshot. But how can I install in Pycharm the new version? Thanks.

sashakid commented 9 years ago

It's worked when I run pip install zipline in terminal in Pycharm project (I've run this commant in Terminal.app before). Then I run pip freeze to see all packages and zipline==0.7.0 appeared in list.

sashakid commented 9 years ago

Can I see the output after calling that in Pycharm or should I use only Notebook to see it? (For now I just launch the code and don't see the output like in tutorial)

import pandas as pd
perf = pd.read_pickle('buyapple_out.pickle') # read in perf DataFrame
perf.head()