Open ikfdarba opened 7 years ago
Hello @ikfdarba, thanks for posting this question.
The values reported by algorithm_period_return and benchmark_period_return don't match because both correspond to cumulative returns, and the daily returns data used to derive these values have different start dates.
Your algorithm starts generating returns after closing its first transaction. Orders begin filling one bar after they are placed to guard your algorithms against look-ahead bias. This is why you see your transaction filled on 2000-01-07, even though it was placed on 2000-01-06, and also why your algorithm_priod_returns start until 2000-01-07.
On the other hand, the benchmark is held since the start of the simulation, so it generates returns from day one.
To account for this difference, benchmark_returns are masked to only include dates found in algorithm_returns' index before risk metrics are calculated (alpha, beta, cumulative returns).
Additionally, extra cash not invested by your algorithm, as well as transaction costs, will have an impact when calculating algorithm_period_return. In order to compare apples to apples (no pun intended), I changed your algorithm to purchase a single share of AAPL at the beginning of the backtest, and I set a minimum trade cost of $0 (your algorithm was using the default, $1).
from zipline.api import set_slippage, set_commission, set_benchmark, symbol, order, get_datetime, record
from zipline.finance.slippage import FixedSlippage
from zipline.finance.commission import PerShare
def initialize(context):
set_slippage(FixedSlippage(spread=0))
set_commission(PerShare(min_trade_cost=0, cost=0))
set_benchmark(symbol('AAPL'))
context.trade_in=True
def handle_data(context, data):
print(get_datetime('US/Eastern'))
print("AAPL vol: %s" % data.current(symbol('AAPL'),'volume'))
if context.trade_in:
print("Order AAPL @ %s" % data.current(symbol('AAPL'),'price'))
order(symbol('AAPL'), 1)
context.trade_in=False
Then, I backtested it between 2017-1-1 and 2017-9-1 using a capital base of $117 to minimize the impact of extra cash on returns calculations (AAPl price was ~$116 on the start date used for the simulation). Here is the command I used:
zipline run -f benchmark_aapl.py --start 2017-1-1 --end 2017-9-1 --capital-base 117 -o benchmark_aapl.pickle
... and here are the results:
...
You can see in the results that as the simulation progresses the values reported by algorithm_period_return and benchmark_period_return get closer to each other.
I hope this helps clarify things.
Hello,
well, you ran exactly what i did (i had also removed the effect of cash as much as possible through the initial-capital), the only difference is that appl did not have a big drop on the first date of your simulation, so the benchmark and algo returns are quite close (but no bullseye with -2.3% alpha on a 9 month simulation), while apple apparently dropped 8% on the first day of my simulation which very much impacted the whole backtest.
In this respect I didn't understand the comment about "benchmark_returns are masked", it seems to me the statistics are computed using the simulation's start date and not the first date after a trade is issued, otherwise could not explain the 9% alpha I am getting, am i wrong? (i did not look through the zipline code)
So the whole thing hinges on the first trade being issued on the next trading day, while the benchmark starts one day earlier, i would say this is (arguably) incorrect since to accurately compare the benchmark with the trades they should start on the same day which is virtually impossible to implement in day-mode. (I guess if we did the same thing in a minute-mode simulation the discrepancy would be (almost) gone as the benchmark would only have one more minute of simulation)
In day-mode there is no way for me to implement a simulation with correct statistics AND correct returns: even if i shift my trades for the 6th to the 5th so that they are issued on the correct day (ie price) the benchmark will start on the 5th, so that again the statistics computed will be slightly incorrect.
In general, it seems quite burdensome to run day-mode simulations because of this next trading day feature. I am not looking ahead, my trades are for the 6th (and it is quite important they be issued on the 6th) using information up to the closing of the 5th and i would like them to be 'dated' the 6th in my feed file and the simulation also to start on the 6th so that the benchmark and trades are correctly matched.
I have now implemented everything in minute-mode, making pseudo minute data by repeating the same price throughout the trading day and am computing my own statistics to be sure i know what is going on...
Dont get me wrong, i think the library is very good, but the issue above seems to me not to be solved correctly and it is quite important the stats be computed cleanly.
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
*Zipline 1.0.2
pip
,conda
, orother (please explain)
)$ pip freeze
or$ conda list
Description of Issue
I noticed some discrepancies in the computation of the benchmark and the portfolio returns: the benchmark and portfolio returns seem to be shifted...
First example is the zipline tutorial buyapple.py. The results of the run are taken from http://www.zipline.io/beginner-tutorial.html and are reproduced below:
one can see here that the benchmark returns start on the 3rd of january while the algorithm's (which if one looks in the code trades every day) start on the 4th. However, if i would like to compare the simulation to the benchmark of course the two return computations should start on the same date
In more detail:
I ran the following code in daily mode where the benchmark is set equal to the stock traded (and the capital is set to a multiple of the stock, the stock is only traded once in the first handle_data call). Thus I would expect the returns to be very close for the benchmark and the trades:
and run
zipline run --bundle csv -f C:/Users/dariobiasini/PythonProjects/buyLB.py --start 2000-1-6 --end 2000-1-19 -o C:/Users/dariobiasini/PythonProjects/buyLB_out.pickle --starting-capital 995
(i set the starting capital so that cash has minimal influence on the trade returns)
this is the output and one can see that the trade for apple is opened on the 6th:
these are the results
one can see that the apple trade is opened with the closing price of the 7th (99.5)
and these are the returns:
Where one can see that the returns of the benchmark and the trades are significantly different.
I can reconcile the returns with yahoo finance data: the benchmark returns are always computed on the closing price of Jan 5 (which for a simulation starting on jan 6th me seems correct), while the portfolio returns are computed on the closing price of Jan 7 which is just not consistent, the returns don't match each other so the alpha and beta statistics cannot be correct.
Now if I run a longer simulation:
zipline run --bundle csv -f C:/Users/dariobiasini/PythonProjects/buyLB.py --start 2000-1-6 --end 2000-6-6 -o C:/Users/dariobiasini/PythonProjects/buyLB_out.pickle --starting-capital 995
I am getting
and
and
so this 9% alpha (!) is exactly due to the initial discrepancy in the returns: here is easy to see, but in a real simulation will not be...
Why is the trade opening using the 7th closing price when the fist day of the simulation is the 6th and thus, like the benchmark it should use the closing of the 5th?
Could you shed some light? I am missing the logic...
thank you