quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.53k stars 4.71k forks source link

Inconsistent data from Yahoo causes crash on simulation #388

Closed diegopdomingos closed 9 years ago

diegopdomingos commented 10 years ago

When running Zipline with ^BVSP benchmark (a simulation with start in 2008-4-1 and end in 2008-5-1), zipline crashs with the following error:

raise LinAlgError("Array must not contain infs or NaNs") numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs

This is due a incosistence in Yahoo benchmark data. In Brazil, Tiradentes Holiday is in April 21, but this date is in Yahoo data as zero return. When running the algorithm, the function handle_market_close() misaligns his "todays_date" with all_benchmark_returns, and RiskMetricsCumulative finally gets "NaN" as valid benchmark returns.

simonvpe commented 9 years ago

I also have the same problem. The tutorial example with AAPL data from earlier than 2010 seems to generate this, and other stocks too. A also tried manually downloading with pandas DataReader and get the same problem. There are no NAN or None in the data, that I verified with DataFrame.count. Please help, I`ve invested lots of lots of time in my algo and now I need to start testing it on more data for longer times.

BR Simon


---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-59-d9c5d6224029> in <module>()
      2 #data2 = load_bars_from_yahoo(stocks=[sym], start=date_range[0], end=date_range[1])
      3 algo = Trade()
----> 4 perf = algo.run(data, benchmark_return_source=False)

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/algorithm.pyc in run(self, source, overwrite_sim_params, benchmark_return_source)
    474             # perf dictionary
    475             perfs = []
--> 476             for perf in self.gen:
    477                 perfs.append(perf)
    478 

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/gens/tradesimulation.pyc in transform(self, stream_in)
    181                     self.algo.performance_needs_update = True
    182 
--> 183             risk_message = self.algo.perf_tracker.handle_simulation_end()
    184             yield risk_message
    185 

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/performance/tracker.pyc in handle_simulation_end(self)
    466             ars,
    467             self.sim_params,
--> 468             benchmark_returns=bms)
    469 
    470         risk_dict = self.risk_report.to_dict()

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/risk/report.pyc in __init__(self, algorithm_returns, sim_params, benchmark_returns)
     83             end_date = self.algorithm_returns.index[-1]
     84 
---> 85         self.month_periods = self.periods_in_range(1, start_date, end_date)
     86         self.three_month_periods = self.periods_in_range(3, start_date,
     87                                                          end_date)

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/risk/report.pyc in periods_in_range(self, months_per, start, end)
    132                 end_date=cur_end,
    133                 returns=self.algorithm_returns,
--> 134                 benchmark_returns=self.benchmark_returns
    135             )
    136 

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/risk/period.pyc in __init__(self, start_date, end_date, returns, benchmark_returns)
     68         self.algorithm_returns = self.mask_returns_to_period(returns)
     69         self.benchmark_returns = self.mask_returns_to_period(benchmark_returns)
---> 70         self.calculate_metrics()
     71 
     72     def calculate_metrics(self):

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/risk/period.pyc in calculate_metrics(self)
    125         self.information = self.calculate_information()
    126         self.beta, self.algorithm_covariance, self.benchmark_variance, \
--> 127             self.condition_number, self.eigen_values = self.calculate_beta()
    128         self.alpha = self.calculate_alpha()
    129         self.excess_return = self.algorithm_period_returns - \

/usr/local/lib/python2.7/dist-packages/zipline-0.7.0-py2.7.egg/zipline/finance/risk/period.pyc in calculate_beta(self)
    254                                     self.benchmark_returns])
    255         C = np.cov(returns_matrix, ddof=1)
--> 256         eigen_values = la.eigvals(C)
    257         condition_number = max(eigen_values) / min(eigen_values)
    258         algorithm_covariance = C[0][1]

/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.pyc in eigvals(a)
    886     _assertRankAtLeast2(a)
    887     _assertNdSquareness(a)
--> 888     _assertFinite(a)
    889     t, result_t = _commonType(a)
    890 

/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.pyc in _assertFinite(*arrays)
    215     for a in arrays:
    216         if not (isfinite(a).all()):
--> 217             raise LinAlgError("Array must not contain infs or NaNs")
    218 
    219 def _assertNoEmpty2d(*arrays):

LinAlgError: Array must not contain infs or NaNs
pdiffley commented 9 years ago

I am having the exact same issue. Is anyone able to run zipline without this problem?

twiecki commented 9 years ago

It's also reported here: https://github.com/quantopian/zipline/issues/473 with the US benchmark. Can someone try running the buy apple example with the same parameters as in #473?

mosesmc52 commented 9 years ago

Hi I downloaded zipline to my computer and tried running the apple example on my computer and I am having the same problem. I'm running on OS X 10.9.5, Python 2.7 Numpy 1.9.1 Pandas .15.2

ssanderson commented 9 years ago

@pdiffley12 @mosesmc52 I haven't been able to repro this locally. My best guess would be that your benchmark data cache somehow acquired bad data. On Unix-ey operating systems, zipline downloads that data to ~/.zipline/data/. I'm not sure where it goes on Windows. Can you try clearing the contents of that directory and running the tutorial example again?

twiecki commented 9 years ago

I have tried that but unfortunately without success.

mosesmc52 commented 9 years ago

I tried clearing my zipline cache and no success as well.

jetheurer commented 9 years ago

I'm seeing the same problem.

Python venv on Ubuntu 14.04 LTS includes: ipython (2.4.1) numpy (1.9.1) pandas (0.15.2) pip (1.5.4) python-dateutil (2.4.0) zipline (0.7.0)

Solved with this:

pip install -U python-dateutil==2.3.0

Thanks @twiecki and #473

twiecki commented 9 years ago

Glad that it works, thanks for reporting.