Closed twiecki closed 4 years ago
I think this user email is a related case:
There seems to be a glitch when the data source contains NaN's. The simulation runs for most part until risk.py raises an exception near the very end of the simulation. For instance, running the following strategy:
class Test(TradingAlgorithm):
def handle_data(self,data):
self.order('myStock',1)
if __name__ == '__main__':
data = load_some_data_using_custom_method()
test = Test()
result = test.run(data)
I get the following traceback:
File "C:\quantopian\test.py", line 37 in <module>
results = test.run(data)
File "C:\quantopian\zipline\algorithm.py", line 295 in run
perfs = list(self.gen)
File "C:\quantopian\zipline\gens\tradesimulation.py", line 157, in transform
yield self.get_message(date)
File "C:\quantopian\zipline\gens\tradesimulation.py", line 181, in get_message
self.algo.perf_tracker.handle_market_close()
File "C:\qunatopian\zipline\finance\performance.py", line 389, in handle_market_close
self.all_benchmark_returns[todays_return_obj.date])
File "C:\quantopian\zipline\finance\risk.py", in line 643, in update
raise Exception(message)
Exception: Mismatch between benchmark_returns (1095) and algorithm_returns (1094) in range 1997-01-02 00:00:00+00:00 : 2004-12-31 00:00:00+00:00 on 2001-05-04 00:00:00+00:00
Bug replication:
- I looked back at data and found NaN entries beginning from observation 1095 onwards. data contains about 50 stocks traded at different exchanges (hence different holidays and NaN's to pad the gaps).
- Using the same data set, where 'myStock' contains NaN's, but applying orders only to 'anotherStock' containing no NaN's, the entire backtest runs successfully.
- My zipline repo is updated to one of the May 08 2013 commits.
Found a bug in the test.test_risk_compare_batch_iterative
which was never calling update for the leading.
Fix is here, https://github.com/quantopian/zipline/commit/16c488e5bcb455c795c3535c170b8ae798558a99
Still, we should investigate what to do with missing returns. Benchmarks I think we can replace with a 0.0. But algorithms with missing data, trying to reason if 0.0 would be since, a NaN would imply no volume (so no trades could change the portfolio), and there would be no change in the pricing information with a NaN. (That assumes a NaN really means 'no trades happened here'.)
However, the above suggestions would be masking the problem in risk, and I think we should investigate what is going in on at the performance module level with the email snipped you attached, since now that we use benchmarks as a 'clock' we should be filling the algorithm returns with values throughout.
Also, @twiecki, I forget, besides the unit tests, was it an example algo or another algo you were working on where you first discovered this? (i.e. what stocks and date range were you working with that had the NaNs.)
@twiecki so not to confuse, https://github.com/quantopian/zipline/commit/16c488e5bcb455c795c3535c170b8ae798558a99 does not contain the valid
vs. [:dt]
fix, yet. But I do believe it gets the tests in shape to be ready for it.
Was this ever fixed? I seem to be getting the same type of error, most probably for the same reason (having at least one NaN in the data).
@GiliR4t1qbit, I'm not sure if this was ever fixed, and can look into it later this week.
Could you share the conditions under which you are seeing the error?
The data I am working with has lots of NaN's in it, due to stocks that had not yet been traded at the beginning of the period. When I realized this, I decided to add a check to handle_data to only include stocks that do not have NaN's for that time period. This got rid of the error. I suspect I was trying to trade a stock whose price was NaN, but I'm not 100% sure. If this is the case, it would be good if the program did not crash, for example, a reasonable behaviour would be for the order to not be fulfilled and a warning message to appear.
On Tue, Feb 18, 2014 at 6:07 AM, Eddie Hebert notifications@github.comwrote:
@GiliR4t1qbit https://github.com/GiliR4t1qbit, I'm not sure if this was ever fixed, and can look into it later this week.
Could you share the conditions under which you are seeing the error?
Reply to this email directly or view it on GitHubhttps://github.com/quantopian/zipline/issues/151#issuecomment-35350835 .
Ah, your error makes sense then, but may be surprising, since one of the current system assumptions made, which may be both:
is that 'drop nans' logic is done in the module providing the data
source generator, i.e. no data for the equity at a given dt is emitted
if the price is nan
.
Handling the nan
values upstream, probably in tradesimulation
, would
be more robust, but would have to be considered against performance
trade-offs.
The issue is still there, it has to do with Yahoo data as I am following zipline tutorial
Closing due to age. If anyone is still experiencing this, feel free to reopen.
If, in risk.py, we change (line 586 and 589):
to:
to do explicit slicing rather than implicit using valid(), I get the following test error:
Dropping into a debugger it seems that there is a NaN in front of the last dt which seems very odd: