quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.58k stars 4.72k forks source link

benchmark returns shifted daily mode generating wrong alpha-beta statistics #1832

Open ikfdarba opened 7 years ago

ikfdarba commented 7 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

*Zipline 1.0.2

Description of Issue

I noticed some discrepancies in the computation of the benchmark and the portfolio returns: the benchmark and portfolio returns seem to be shifted...

First example is the zipline tutorial buyapple.py. The results of the run are taken from http://www.zipline.io/beginner-tutorial.html and are reproduced below:

                            algorithm_period_return      benchmark_period_return
AAPL        
03.01.2000 21:00    0.00E+00                                -0.0658
04.01.2000 21:00    3.37E-07                                -0.064897
05.01.2000 21:00    4.00E-07                                -0.066196
06.01.2000 21:00    4.99E-06                                -0.065758
07.01.2000 21:00    5.98E-06                                -0.065206

one can see here that the benchmark returns start on the 3rd of january while the algorithm's (which if one looks in the code trades every day) start on the 4th. However, if i would like to compare the simulation to the benchmark of course the two return computations should start on the same date

In more detail:

I ran the following code in daily mode where the benchmark is set equal to the stock traded (and the capital is set to a multiple of the stock, the stock is only traded once in the first handle_data call). Thus I would expect the returns to be very close for the benchmark and the trades:

def initialize(context):

    set_slippage(slippage.FixedSlippage(0.00))
    set_commission(commission.PerShare(cost=0))

    set_benchmark(symbol('AAPL'))
    context.tradeIn=True

def handle_data(context, data):
    print(get_datetime().tz_convert('US/Eastern'))
    print(str(symbol('AAPL'))+ " @ " + str(data.current(symbol('AAPL'),'price')))
    if context.tradeIn:

        print('trading in')
        order_target_percent(symbol('AAPL'), 1)
        print("opening " + str(symbol('AAPL'))+ " @ " + str(data.current(symbol('AAPL'),'price')))
        context.tradeIn=False

and run zipline run --bundle csv -f C:/Users/dariobiasini/PythonProjects/buyLB.py --start 2000-1-6 --end 2000-1-19 -o C:/Users/dariobiasini/PythonProjects/buyLB_out.pickle --starting-capital 995

(i set the starting capital so that cash has minimal influence on the trade returns)

this is the output and one can see that the trade for apple is opened on the 6th:

2000-01-06 16:00:00-05:00
Equity(0 [AAPL]) @ 95.0
trading in
opening Equity(0 [AAPL]) @ 95.0
2000-01-07 16:00:00-05:00
Equity(0 [AAPL]) @ 99.5
2000-01-10 16:00:00-05:00
Equity(0 [AAPL]) @ 97.75
2000-01-11 16:00:00-05:00
Equity(0 [AAPL]) @ 92.75
2000-01-12 16:00:00-05:00
Equity(0 [AAPL]) @ 87.19

these are the results

perf.transactions
Out[473]: 
2000-01-06 21:00:00                                                   []
2000-01-07 21:00:00    [{'price': 99.5, 'commission': None, 'sid': Eq...
2000-01-10 21:00:00                                                   []
2000-01-11 21:00:00                                                   []
2000-01-12 21:00:00                                                   []
Name: transactions, dtype: object

one can see that the apple trade is opened with the closing price of the 7th (99.5)

and these are the returns:

perf.algorithm_period_return
Out[465]: 
2000-01-06 21:00:00    0.000000
2000-01-07 21:00:00   -0.001005
2000-01-10 21:00:00   -0.018593
2000-01-11 21:00:00   -0.068844
2000-01-12 21:00:00   -0.124724
2000-01-13 21:00:00   -0.028643
2000-01-14 21:00:00    0.008442
2000-01-18 21:00:00    0.043618
2000-01-19 21:00:00    0.069950
Name: algorithm_period_return, dtype: float64

perf.benchmark_period_return
Out[466]: 
2000-01-06 21:00:00   -0.086538
2000-01-07 21:00:00   -0.043269
2000-01-10 21:00:00   -0.060096
2000-01-11 21:00:00   -0.108173
2000-01-12 21:00:00   -0.161635
2000-01-13 21:00:00   -0.069712
2000-01-14 21:00:00   -0.034231
2000-01-18 21:00:00   -0.000577
2000-01-19 21:00:00    0.024615
Name: benchmark_period_return, dtype: float64

Where one can see that the returns of the benchmark and the trades are significantly different.

I can reconcile the returns with yahoo finance data: the benchmark returns are always computed on the closing price of Jan 5 (which for a simulation starting on jan 6th me seems correct), while the portfolio returns are computed on the closing price of Jan 7 which is just not consistent, the returns don't match each other so the alpha and beta statistics cannot be correct.

Now if I run a longer simulation: zipline run --bundle csv -f C:/Users/dariobiasini/PythonProjects/buyLB.py --start 2000-1-6 --end 2000-6-6 -o C:/Users/dariobiasini/PythonProjects/buyLB_out.pickle --starting-capital 995

I am getting

perf.beta
Out[470]: 
2000-01-06 21:00:00         NaN
...
2000-06-06 20:00:00    0.955795

and

perf.alpha
Out[471]: 
2000-01-06 21:00:00         NaN
...
2000-06-06 20:00:00    0.091247
perf.benchmark_period_return
Out[475]: 
2000-01-06 21:00:00   -0.086538
...
2000-06-06 20:00:00   -0.107019
Name: benchmark_period_return, dtype: float64

and

perf.algorithm_period_return
Out[476]: 
2000-01-06 21:00:00    0.000000
...
2000-06-06 20:00:00   -0.067638
Name: algorithm_period_return, dtype: float64

so this 9% alpha (!) is exactly due to the initial discrepancy in the returns: here is easy to see, but in a real simulation will not be...

Why is the trade opening using the 7th closing price when the fist day of the simulation is the 6th and thus, like the benchmark it should use the closing of the 5th?

Could you shed some light? I am missing the logic...

thank you

ernestoeperez88 commented 7 years ago

Hello @ikfdarba, thanks for posting this question.

The values reported by algorithm_period_return and benchmark_period_return don't match because both correspond to cumulative returns, and the daily returns data used to derive these values have different start dates.

Your algorithm starts generating returns after closing its first transaction. Orders begin filling one bar after they are placed to guard your algorithms against look-ahead bias. This is why you see your transaction filled on 2000-01-07, even though it was placed on 2000-01-06, and also why your algorithm_priod_returns start until 2000-01-07.

On the other hand, the benchmark is held since the start of the simulation, so it generates returns from day one.

To account for this difference, benchmark_returns are masked to only include dates found in algorithm_returns' index before risk metrics are calculated (alpha, beta, cumulative returns).

Additionally, extra cash not invested by your algorithm, as well as transaction costs, will have an impact when calculating algorithm_period_return. In order to compare apples to apples (no pun intended), I changed your algorithm to purchase a single share of AAPL at the beginning of the backtest, and I set a minimum trade cost of $0 (your algorithm was using the default, $1).

from zipline.api import set_slippage, set_commission, set_benchmark, symbol, order, get_datetime, record
from zipline.finance.slippage import FixedSlippage
from zipline.finance.commission import PerShare

def initialize(context):
    set_slippage(FixedSlippage(spread=0))
    set_commission(PerShare(min_trade_cost=0, cost=0))
    set_benchmark(symbol('AAPL'))

    context.trade_in=True

def handle_data(context, data):
    print(get_datetime('US/Eastern'))
    print("AAPL vol: %s" % data.current(symbol('AAPL'),'volume'))

    if context.trade_in:
        print("Order AAPL @ %s" % data.current(symbol('AAPL'),'price'))
        order(symbol('AAPL'), 1)
        context.trade_in=False

Then, I backtested it between 2017-1-1 and 2017-9-1 using a capital base of $117 to minimize the impact of extra cash on returns calculations (AAPl price was ~$116 on the start date used for the simulation). Here is the command I used:

zipline run -f benchmark_aapl.py --start 2017-1-1 --end 2017-9-1 --capital-base 117 -o benchmark_aapl.pickle

... and here are the results:

screen shot 2017-09-22 at 4 44 42 pm ... screen shot 2017-09-22 at 4 48 09 pm

You can see in the results that as the simulation progresses the values reported by algorithm_period_return and benchmark_period_return get closer to each other.

I hope this helps clarify things.

ikfdarba commented 7 years ago

Hello,

well, you ran exactly what i did (i had also removed the effect of cash as much as possible through the initial-capital), the only difference is that appl did not have a big drop on the first date of your simulation, so the benchmark and algo returns are quite close (but no bullseye with -2.3% alpha on a 9 month simulation), while apple apparently dropped 8% on the first day of my simulation which very much impacted the whole backtest.

In this respect I didn't understand the comment about "benchmark_returns are masked", it seems to me the statistics are computed using the simulation's start date and not the first date after a trade is issued, otherwise could not explain the 9% alpha I am getting, am i wrong? (i did not look through the zipline code)

So the whole thing hinges on the first trade being issued on the next trading day, while the benchmark starts one day earlier, i would say this is (arguably) incorrect since to accurately compare the benchmark with the trades they should start on the same day which is virtually impossible to implement in day-mode. (I guess if we did the same thing in a minute-mode simulation the discrepancy would be (almost) gone as the benchmark would only have one more minute of simulation)

In day-mode there is no way for me to implement a simulation with correct statistics AND correct returns: even if i shift my trades for the 6th to the 5th so that they are issued on the correct day (ie price) the benchmark will start on the 5th, so that again the statistics computed will be slightly incorrect.

In general, it seems quite burdensome to run day-mode simulations because of this next trading day feature. I am not looking ahead, my trades are for the 6th (and it is quite important they be issued on the 6th) using information up to the closing of the 5th and i would like them to be 'dated' the 6th in my feed file and the simulation also to start on the 6th so that the benchmark and trades are correctly matched.

I have now implemented everything in minute-mode, making pseudo minute data by repeating the same price throughout the trading day and am computing my own statistics to be sure i know what is going on...

Dont get me wrong, i think the library is very good, but the issue above seems to me not to be solved correctly and it is quite important the stats be computed cleanly.