Remove Implicit Dependency on Benchmarks and Treasury Returns

ssanderson commented 4 years ago

Background

Zipline currently requires two special data inputs for simulations: "Benchmark Returns", which are used to calculate the "Alpha" and "Beta" metrics, among other things, and "Treasury Curves", which were at one time used as the "Risk Free Rate", which was part of the Sharpe Ratio calculation.

Since these inputs are required by all simulations, we implicitly fetch them from third party API sources if they're not provided by users. We get treasury data from the US Federal Reserve's API, and we get benchmarks from IEX.

Problems

Implicitly fetching benchmarks and treasuries causes many problems:

Implicitly fetching means that running simulations requires an internet connection. We try to make this less painful by caching downloaded results and re-using them when possible, but this is only a partial fix, and it means that many users don't notice the implicit download until it starts causing mysterious problems.
The APIs we fetch from sometimes fail, leading to confusing behavior for users and spurious bug reports for Zipline maintainers.
The APIs we fetch from sometimes change in incompatible ways, which breaks older versions of Zipline. This is currently the case for the IEX API we use to fetch benchmarks, resulting in issues like:
Our default benchmark is US-centric. We default to using SPY as the benchmark, which only makes sense in the US (and even then, only makes sense if you have also historical dividends for SPY, which many users don't have).

Proposed Solution

I think we should remove these implicit dependencies from Zipline. Treasuries we should just remove, since they're not actually used anymore. Figuring out what to do with benchmarks is a bit trickier.

Treasuries

Removing treasuries is relatively straightforward because we no longer actually use them. A quick scan of our GitHub issues turns up these issues that should be fixed by the removal:

I've opened a PR at https://github.com/quantopian/zipline/pull/2626 to finally remove all traces of the treasury subsystem.

Benchmarks

Benchmarks are a bit trickier. The benchmark is used in the calculation of the "alpha" and "beta" metrics, and many users are generally interested in comparing the returns of their strategy against a particular benchmark (often an ETF or index of some kind). We also don't currently have a way to specify a benchmark from the command line, or to define a benchmark asset for a particular bundle.

I think there are a few things we could do to improve the situation here:

We could add the ability to define a benchmark explicitly when running Zipline via the CLI. We already have the ability to do this internally, but there's no supported way to control the benchmark via the CLI or via an extension. I think this is necessary pretty much no matter what.
- Optionally, we could also make it required that the user tell us what their benchmark is. This would remove the need for implicit fetching of the benchmark. Users who don't care could pass a dummy benchmark (e.g., of all zero returns).
Make the benchmark optional. Making the benchmark optional would result in alpha, beta, and any other benchmark-dependent risk metrics not being populated in zipline's output. The tricky thing here is to do this in a way that doesn't result in performance degradataion when running with a benchmark. I think we either should do this or make the benchmark asset required.
(Short Term) We can fix our IEX API calls for benchmark data to use the updated APIs. This doesn't fix the systemic maintenance issues associated with the benchmark, but it would at least fix Zipline being straight-up broken for many people, which is its current status. I think the main challenge here is that IEX now requires an API token to work at all, which means we need to provide some mechanism for the user to pass in their API token.

samatix commented 4 years ago

Thank you @ssanderson ,

I'll set as a priority to work on this topic. I'm preparing a design proposal taking into account your three suggestions and running few tests.

I'll work on it this evening and get back to you by eod.

Thanks, Ayoub

samatix commented 4 years ago

Hi Scott,

I've made the proposal solution below based on your suggestions (any remark you might have is more than welcome). I'll be working on it tomorrow evening and this weekend. My two goals are to post a workaround quickly for the IEX issue and to limit as much as possible the changes required on zipline.

Requirements

The benchmark data retrieval should be separated from zipline
The benchmark definition should enable the usage of different benchmarks (Indexes, ETFs, different geographies)
No performance regression

Solution

The option to make the benchmark optional is coherent with the fact that zipline is a specialised backtester (it won't impact your need at Quantopian to see the performance results in realtime). If the end user wants to analyse his strategy's performance, he can use Alphalens (compare the returns to the returns from a specific benchmark instrument or factor).

I can do the following:

For tomorrow evening, I can write a simple WA to show the end users how to download the data from IEX and add them in Zipline without the need to amend the benchmark.py by manually saving the returns in zipline data root folder via simple pd transformations. I'll push this change to the readme file after it's accepted
For this weekend, I can add an option to zipline to ask the end users to provide the benchmark closing prices in Json/CSV (in OHLCV) format with the path to the file. We can provide the end users with sample code on how to download the data from IEX to match the format required as a way to download the data

Backtesting with Benchmark Data in CSV/JSON

Add a main command to use a custom benchmark file that contains at least (datetime, close)

@main.command()
@click.option(
    '-bf',
    '--benchmarkfile',
    default=None,
    type=click.File('r'),
    help='The csv file that contains the benchmark closing prices',
)

Allow as well to use a benchmark from the ingested data via its SID (or equivalent)
If the benchmark is not given (or not forced to None), try to retrieve the default data already saved in data_root.
If the benchmarks are empty or forced to None move to the next section

Backtesting without Benchmark

Pass a zero dataframe to not change zipline's core functionalities
Raise a warning on the metrics valuation

Validation Steps

Test with a strategy that buys and holds SPY until today (and until one week ago) with the benchmark SPY -> Validate the metrics values
Test with a strategy that buys AAPL until today with the benchmark SPY until today (and until one week ago) -> Validate the metrics values
Test with a strategy that buys SX5E until today with the benchmark SX5E until today (and until one week ago) -> Validate the metrics values
Test with a strategy that buys and holds FP until today with the benchmark SX5E (different data bundle as different calendar) -> Validate the metrics values

ssanderson commented 4 years ago

@samatix I think your proposal for adding a CLI param to support setting a benchmark makes sense. I could imagine that parameter taking a few different forms:

As you suggest, we could add a --benchmark-file pointing at a CSV of returns data.
We could add --benchmark-symbol or --benchmark-sid parameters. A sid could be passed through unchanged to TradingAlgorithm. The symbol would need to be looked up in the AssetFinder via lookup_symbol, and then the sid could be passed through.

One note on your current proposal: you suggested that the benchmark would be an argument to main, but probably more appropriate would be to make it an argument to run, which is the subcommand that runs algorithms.

Regarding your proposal for getting backtests to work without a benchmark, I think passing a series of 0s (and firing a clear warning) is probably a reasonable short term fix. Longer term, I think a better solution would probably be to integrate the optionality of the benchmark more deeply into the metrics subsystem. My main worry about passing all zeros is that we'll still end up doing a bunch of work to compute meaningless metrics. I'd have to think a bit on what the best path forward to fix that would be though.

samatix commented 4 years ago

@ssanderson , let me share with you the validation results related to the pull request (https://github.com/quantopian/zipline/pull/2642) to get your point of view on the new behavior.

Following your comments, my remaining actions would be to :

Review your comments
Reformulate the warning messages and raise them as logger warnings instead of click.echo
Amend the documentation with the new behavior
Add unitary testings

Validation

Configuration

Let us consider two instruments A (a) and instrument B (b) to be used as a benchmark :

Data file for the instrument A a.csv:

date,open,high,low,close,adj_close,volume
2020-01-02 00:00:00+00:00,100,100,100,100,100,10000
2020-01-03 00:00:00+00:00,120,120,120,120,120,12000
2020-01-06 00:00:00+00:00,100,100,100,100,100,10000
2020-01-07 00:00:00+00:00,160,160,160,160,160,16000
2020-01-08 00:00:00+00:00,180,180,180,180,180,18000
2020-01-09 00:00:00+00:00,200,200,200,200,200,20000

Data file for the benchmark B b.csv:

date,open,high,low,close,adj_close,volume
2020-01-02 00:00:00+00:00,100,100,100,100,100,10000
2020-01-03 00:00:00+00:00,90,90,90,90,90,9000
2020-01-06 00:00:00+00:00,120,120,120,120,120,10000
2020-01-07 00:00:00+00:00,140,140,140,140,140,14000
2020-01-08 00:00:00+00:00,160,160,160,160,160,16000
2020-01-09 00:00:00+00:00,180,180,180,1180,180,18000

Ingestion Configuration

import pandas as pd
from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities
start_session2 = pd.Timestamp('2020-01-02', tz='utc')
end_session2 = pd.Timestamp('2020-01-09', tz='utc')
register(
    'csv-xpar-sample',
    csvdir_equities(
        ['daily'],
        '/Users/aennassiri/opensource/zipline',
    ),
    calendar_name='XPAR',
    start_session=start_session2,
    end_session=end_session2
)

Sample Algorithm

The algorithm is supposed to order at the beginning of the backtesting period 1000 A stocks

from zipline.api import order, symbol
from zipline.finance import commission, slippage
def initialize(context):
    context.stocks = symbol('a')
    context.has_ordered = False
    context.set_commission(commission.NoCommission())
    context.set_slippage(slippage.NoSlippage())
def handle_data(context, data):
    if not context.has_ordered:
        order(context.stocks, 1000)
        context.has_ordered = True

Tests

Run without any benchmark option

Command:

run -f TEST_FOLDER/test_benchmark3.py -b csv-xpar-sample -s 01/01/2020 -e 01/09/2020

Result

Warning: Neither a benchmark file nor a benchmark symbol is provided. Trying to use the default benchmark loader. To use zero as a benchmark, use the flag --no-benchmark
...
ValueError: Please set your IEX_API_KEY environment variable and retry.
Please note that this feature will be deprecated

Comment

If no benchmark-setting is used, we use the default benchmark data loader after raising a warning. The latter checks if data already exists in the zipline data folder, if not it tries to download it from IEX as before. I've made a change in the code to enable users who set an environment variable with the IEX_API_KEY code.

This option is kept as to not break the backward compatibility for users who put directly the benchmark data in the zipline data folder.

TODO:

I'm going to update the error message:

Advise the end-user to prefer using the explicit benchmark options provided
Tell the end-user that it is possible to directly put the benchmark data in the data folder
Inform the end-user that it is possible to download the data from IEX, if the env variable IEX_API_KEY code though warns him that the benchmark SPY is going to be used

Run with benchmark file option

Command:

run -f TEST_FOLDER/test_benchmark3.py -b csv-xpar-sample -s 01/01/2020 -e 01/09/2020 --benchmark-file TEST_FOLDER/data/daily/b.csv --trading-calendar XPAR

Result

[2020-02-07 10:19:55.904112] INFO: zipline.finance.metrics.tracker: Simulated 6 trading days
first open: 2020-01-02 08:01:00+00:00
last close: 2020-01-09 16:30:00+00:00
                           algo_volatility  algorithm_period_return     alpha  \
2020-01-02 16:30:00+00:00              NaN                    0.000       NaN   
2020-01-03 16:30:00+00:00         0.000000                    0.000  0.000000   
2020-01-06 16:30:00+00:00         0.018330                   -0.002 -0.070705   
2020-01-07 16:30:00+00:00         0.055083                    0.004  0.268001   
2020-01-08 16:30:00+00:00         0.048217                    0.006  0.310527   
2020-01-09 16:30:00+00:00         0.043427                    0.008  0.299573   
...

Order at trading day 2 :

[{'price': 120.0, 'amount': 1000, 'dt': Timestamp('2020-01-03 16:30:00+0000', tz='UTC'), 'sid': Equity(0 [A]), 'order_id': '8b3d018994cf43db960e2943b59f7ef0', 'commission': None}]

File	algo_volatility	algorithm_period_return	alpha	benchmark_period_return	benchmark_volatility	beta	capital_used
2020-01-02 16:30:00+00:00		0.0		0.0			0.0
2020-01-03 16:30:00+00:00	0.0	0.0	0.0	-0.09999999999999998	1.1224972160321822	0.0	-120000.0
2020-01-06 16:30:00+00:00	0.018330302779823376	-0.0020000000000000018	-0.07070503597122309	0.19999999999999996	3.6018513757973594	-0.004964028776978422	0.0
2020-01-07 16:30:00+00:00	0.05508274964812368	0.0040000000000000036	0.26800057257371374	0.40000000000000013	3.0243456592570017	-0.0006048832358595375	0.0
2020-01-08 16:30:00+00:00	0.048217028714915476	0.006000000000000005	0.3105268451522164	0.6000000000000001	2.6367729194171097	-0.0002895623813476541	0.0
2020-01-09 16:30:00+00:00	0.043427366958536835	0.008000000000000007	0.34104117177313514	0.8	2.3608034346685574	-0.0001915086325657816	0.0

Comment

The benchmark data from the provided file is correctly loaded.

Run with benchmark symbol option

Command:

run -f TEST_FOLDER/test_benchmark3.py -b csv-xpar-sample -s 01/01/2020 -e 01/09/2020 --benchmark-symbol b --trading-calendar XPAR

Result

[2020-02-07 10:28:00.235496] INFO: zipline.finance.metrics.tracker: Simulated 6 trading days
first open: 2020-01-02 08:01:00+00:00
last close: 2020-01-09 16:30:00+00:00
                           algo_volatility  algorithm_period_return     alpha  \
2020-01-02 16:30:00+00:00              NaN                    0.000       NaN   
2020-01-03 16:30:00+00:00         0.000000                    0.000  0.000000   
2020-01-06 16:30:00+00:00         0.018330                   -0.002 -0.070705   
2020-01-07 16:30:00+00:00         0.055083                    0.004  0.268001   
2020-01-08 16:30:00+00:00         0.048217                    0.006  0.310527   
2020-01-09 16:30:00+00:00         0.043427                    0.008  0.341041

Order at trading day 2 :

[{'amount': 1000, 'sid': Equity(0 [A]), 'dt': Timestamp('2020-01-03 16:30:00+0000', tz='UTC'), 'price': 120.0, 'order_id': '18d3e8ab70be4cf392b2f8e044e3680d', 'commission': None}]

Symbol	algo_volatility	algorithm_period_return	alpha	benchmark_period_return	benchmark_volatility	beta	capital_used
2020-01-02 16:30:00+00:00		0.0		0.0			0.0
2020-01-03 16:30:00+00:00	0.0	0.0	0.0	-0.09999999999999998	1.1224972160321822	0.0	-120000.0
2020-01-06 16:30:00+00:00	0.018330302779823376	-0.0020000000000000018	-0.07070503597122309	0.19999999999999996	3.6018513757973594	-0.004964028776978422	0.0
2020-01-07 16:30:00+00:00	0.05508274964812368	0.0040000000000000036	0.26800057257371374	0.40000000000000013	3.0243456592570017	-0.0006048832358595375	0.0
2020-01-08 16:30:00+00:00	0.048217028714915476	0.006000000000000005	0.3105268451522164	0.6000000000000001	2.6367729194171097	-0.0002895623813476541	0.0
2020-01-09 16:30:00+00:00	0.043427366958536835	0.008000000000000007	0.34104117177313514	0.8	2.3608034346685574	-0.0001915086325657816	0.0

Comment

The benchmark data from the provided file is correctly loaded and matches the results from the test with benchmark_file.

If the benchmark data symbol is not found, a warning is raised and the default loader is used as a contingency plan:

/Users/aennassiri/opensource/zipline/zipline/utils/run_algo.py:116: UserWarning: Symbol c as a benchmark not found in this bundle. Proceeding with default benchmark loader
  "loader" % benchmark_symbol)
[2020-02-07 10:32:49.049269] INFO: Loader: Cache at /Users/aennassiri/.zipline/data/SPY_benchmark.csv does not have data from 1990-01-02 00:00:00+00:00 to 2020-02-07 00:00:00+00:00.
[2020-02-07 10:32:49.049469] INFO: Loader: Downloading benchmark data for 'SPY' from 1989-12-29 00:00:00+00:00 to 2020-02-07 00:00:00+00:00

TODO:

Correct the warning message
I did the validation with the Paris Calendar and wanted to check how the system behaves when the trading calendar is given in the ingested data and is different from the one used for running the algorithm. I need to review how the system booked an order at 21:30 knowing that the ingested data is from a different calendar. This issue is independent of the benchmark validation

Run with --no-benchmark option

Command:

run -f TEST_FOLDER/test_benchmark3.py -b csv-xpar-sample -s 01/01/2020 -e 01/09/2020 --no-benchmark

Result

Warning: Using zero returns as a benchmark. The risk metrics that requires benchmark returns will not be calculated.
[2020-02-07 10:38:07.174387] INFO: zipline.finance.metrics.tracker: Simulated 6 trading days
first open: 2020-01-02 14:31:00+00:00
last close: 2020-01-09 21:00:00+00:00
                           algo_volatility  algorithm_period_return alpha  \
2020-01-02 21:00:00+00:00              NaN                    0.000  None   
2020-01-03 21:00:00+00:00         0.000000                    0.000  None   
2020-01-06 21:00:00+00:00         0.018330                   -0.002  None   
2020-01-07 21:00:00+00:00         0.055083                    0.004  None   
2020-01-08 21:00:00+00:00         0.048217                    0.006  None   
2020-01-09 21:00:00+00:00         0.043427                    0.008  None   
                           benchmark_period_return benchmark_volatility  beta  \
2020-01-02 21:00:00+00:00                      0.0                 None  None   
2020-01-03 21:00:00+00:00                      0.0                [0.0]  None   
2020-01-06 21:00:00+00:00                      0.0                [0.0]  None   
2020-01-07 21:00:00+00:00                      0.0                [0.0]  None   
2020-01-08 21:00:00+00:00                      0.0                [0.0]  None   
2020-01-09 21:00:00+00:00                      0.0                [0.0]  None

[{'price': 120.0, 'order_id': 'b607afab7c674f22a303a0a483f00a31', 'amount': 1000, 'sid': Equity(0 [A]), 'commission': None, 'dt': Timestamp('2020-01-03 21:00:00+0000', tz='UTC')}]

Comment

A warning states that the system is using zero returns as a benchmark.
All the results are the same except between the last three runs except for Alpha (None), Beta(None), benchmark_returns (Zero), benchmark_volatility(Zero).

Appendix

The comparison and reconciliation of the returns, volatility, alpha, beta can be found in this sheet

samatix commented 4 years ago

@ssanderson, I'll proceed to the next steps while waiting for your feedback on the suggested implementation:

Reformulate the warning messages and raise them as logger warnings instead of click.echo
Amend the documentation with the new behavior
Add unitary testings

Thanks Ayoub

jackmoody11 commented 4 years ago

Is there currently a way to override the benchmark data or is this in progress? Seems like this blocks me from getting started unless I change the source code, but I am not familiar with how get_benchmark_returns is used.

ssanderson commented 4 years ago

hey @samatix. Apologies for the delay in responding here. Your proposed changes look pretty reasonable to me. To make sure I understand your proposal, it sounds like you're planning to add three new, mutually-excludisve flags to the zipline run command:

--benchmark-file, which should point to a CSV containing benchmark data. I'd expect the implementation of that flag to read the CSV, compute returns from it, and pass them as benchmark_returns to zipline.utils.run_algo._run.
--benchmark-symbol, which would take a symbol, convert it into a sid using the asset finder that will be used to run the algorithm, and then forward that sid as benchmark_sid to TradingAlgorithm.
--no-benchmark, which would cause us to pass an array of all 0s to benchmark_returns (and maybe in a future improvement, disable computing metrics that depend on the benchmark).

If none of the above options is passed, your proposal would be to emit a warning and then fall back to trying to load from IEX.

Does that all sound correct?

Assuming it does, I think the only question I have is what the right format should be for data passed via --benchmark-file. In your example above, you're passing a csv with OHLCV columns, but the only thing we actually need from the benchmark is returns. Additionally, if you want to use a benchmark like SPY, you probably want to account for dividends, which means you wouldn't want to calculate returns using raw prices; you'd want to either use adjusted prices, or you'd want to ensure that the benchmark returns calculation accounts for dividends (we do this internally if you pass a benchmark_sid. I wonder if a simpler format might be to require the user to just provide returns directly, in a format like:

date,return
2020-01-02 00:00:00+00:00,0.01
2020-01-03 00:00:00+00:00,-0.02
...

ssanderson commented 4 years ago

@jackmoody11 to answer your question, @samatix has been working on this issue over in https://github.com/quantopian/zipline/pull/2642.

jackmoody11 commented 4 years ago

Thank you @ssanderson! Is there a way to use set_benchmark to override the IEX data or will I need to wait for the PR?

ssanderson commented 4 years ago

I don't think there's a way that's exposed via the command line (i.e., via zipline run). You can however pass benchmark_returns as a Series to zipline.utils.run_algo, which is what zipline_run dispatches to internally.

marlowequart commented 4 years ago

Hi, I read through https://github.com/quantopian/zipline/issues/2480#, which sends me here. I tried changing the benchmarks.py file and commenting out the sections of loader.py as mentioned there and I am still getting the same error. It seems like the benchmark file being returned just has 0.0 for all closing values. Is that what is causing the timeout issue with dividing by zero? Am I missing something? Any help is appreciated.

samatix commented 4 years ago

Hey @ssanderson, apologies as well from my side for the delay to respond to your questions. I was in Spain when the lockdown started at the beginning of this month and had difficulties getting back to Paris. Everything is back to normal now and I'm planning to finalize this topic this weekend.

To answer your questions, you are right in your understanding. The three options are mutually exclusive and we fall back to the default behavior of verifying if the benchmark data exists in the data folder (loading it from IEX in case required). The purpose is to not break the backward compatibility for people who directly put the benchmark data in the data folder.

Noted for your remarks on passing the returns that take into account the dividends instead of passing OHLCV data. I'll add a quick explanation to the documentation on how to generate the returns from OHLCV data.

marlowequart commented 4 years ago

Hi, I read through #2480, which sends me here. I tried changing the benchmarks.py file and commenting out the sections of loader.py as mentioned there and I am still getting the same error. It seems like the benchmark file being returned just has 0.0 for all closing values. Is that what is causing the timeout issue with dividing by zero? Am I missing something? Any help is appreciated.

I have been able to narrow down this issue to zipline trying to calculate the sharpe ratio and sortino ratio. I commented out the divide in the stats.py file and set the sharp and sorting ratio to a fixed value and it caused this error to go away. I believe the problem is that when I implemented the fix for the API issue, the benchmark data that is loaded is zeros and this is causing the divide function in stats.py issues.

Does anyone know where in the zipline package the functions for sharpe and sortino are called? it seems like it would be better to comment those out in zipline rather than the stats module. It seems like they are called on a daily basis

daraul commented 4 years ago

I tried just using @samatix's fork to attempt to work around this issue and was met with this error:

Traceback (most recent call last):
  File "/usr/local/bin/zipline", line 5, in <module>
    from zipline.__main__ import main
  File "/usr/local/lib/python3.5/site-packages/zipline/__init__.py", line 29, in <module>
    from .utils.run_algo import run_algorithm
  File "/usr/local/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 21, in <module>
    from zipline.data.benchmarks import get_benchmark_returns_from_file
ImportError: cannot import name 'get_benchmark_returns_from_file'

Really hoped that'd work. Is there a recommended workaround in the meantime?

samatix commented 4 years ago

@daraul , I used the master branch in my fork for different tests so I don't expect it to be clean. I've used a specific branch for this issue. You can directly use the code from the PR that is pending validation from Quantopian.

If you use the release 1.3.0, you can control the benchmark file data (~/.zipline/data/SPY_benchmark.csv). You can put fixed returns or the SPY returns (doesn't account for dividends) from IEX.

Here is a sample code to amend this file. If you have issues when running the algorithm with

import requests
import pandas as pd
from trading_calendars import get_calendar
from zipline.data.loader import get_benchmark_filename, get_data_filepath

# Dates details
start_date = pd.to_datetime("01/01/2000", utc=True)
end_date = pd.Timestamp.now(tz='UTC')
trading_calendar = get_calendar("XNYS")
dates = trading_calendar.sessions_in_range(start_date, end_date)

# Generating Fixed Returns
def generate_fixed_benchmark_data(rate=0):
    return pd.DataFrame(rate, index=dates, columns=['close']).sort_index().iloc[1:]

# Generating IEX SPY Benchmark data
def generate_iex_data():
    iex_api_key = "YOUR_IEX_API_KEY"
    r = requests.get(
        "https://cloud.iexapis.com/stable/stock/{}/chart/5y?"
        "chartCloseOnly=True&token={}".format("SPY", iex_api_key)
    )
    data = r.json()

    df = pd.DataFrame(data)

    df.index = pd.DatetimeIndex(df['date'])
    df = df['close']

    return df.sort_index().tz_localize('UTC').pct_change(1).iloc[1:]

# Saving the files
filename = get_benchmark_filename("SPY")

# data = generate_iex_data()
# Or
data = generate_fixed_benchmark_data()

data.to_csv(get_data_filepath(filename), header=False)

daraul commented 4 years ago

I could have sworn I checked out the benchmark branch, @samatix. I'll double check, but shouldn't pip install git+https://github.com/samatix/zipline.git@benchmark do the trick?

daraul commented 4 years ago

Just double checked, and I was able to find your changes to __main__.py on my system, @samatix.

samatix commented 4 years ago

@daraul , do you still have the original issue that you mentioned? If you encounter any problem, please let me know, so that I can review it and submit a correction with the PR ;)

daraul commented 4 years ago

Yep, I've still got the same error, @samatix! I even tried zipline --help and got the error again:

Traceback (most recent call last):
  File "/usr/local/bin/zipline", line 5, in <module>
    from zipline.__main__ import main
  File "/usr/local/lib/python3.5/site-packages/zipline/__init__.py", line 29, in <module>
    from .utils.run_algo import run_algorithm
  File "/usr/local/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 21, in <module>
    from zipline.data.benchmarks import get_benchmark_returns_from_file
ImportError: cannot import name 'get_benchmark_returns_from_file'

daraul commented 4 years ago

Forgive me, @samatix. I forgot I was copying an old patched benchmarks.py file into my docker container. I removed that and it works, now.

matiasthunder commented 4 years ago

Hi @samatix ,

Been tracking this PR, thank you for the great work here. I've been trying hard to install zipline afresh now from here with this update but no luck. I wanted to use a custom calendar, possibly edit an existing one. The data bundle ingestion would be a custom one. I'm using Conda python 3.5 environment, windows 10. pip install git seems to throw a c++ 14.0 VCredist error. The 4th validation test to do this PR, is actually what i need to set up. Could you please assist, and let me know if i can furnish any further info to resolve this?

richafrank commented 4 years ago

Done in https://github.com/quantopian/zipline/pull/2626 and https://github.com/quantopian/zipline/pull/2642

marlowequart commented 4 years ago

@richafrank I am running zipline 1.4.0 now, and I am trying to modify the benchmark returns. Do you have any more documentation or examples on how to specify the new benchmark returns options? if I want to do --no-benchmark, do I pass this in the zipline.run_algorithm command? Someplace else? Not sure that I should open up a new issue here, just trying to understand how to get this to work.

richafrank commented 4 years ago

Hi @marlowequart ,

--no-benchmark can be provided as an option to the zipline run command. The zipline.run_algorithm function also accepts a benchmark_returns parameter, which defaults to None, also meaning no benchmarks, but you could pass a returns series if you want.

It looks like run_algorithm is missing documentation for benchmark_returns, if you want to PR that.

playxback commented 4 years ago

Hello @richafrank ,

I'm running Zipline 1.4.0 and In zipline.run_algorithm , I specified that benchmark_returns = None. But , I still get the below error:

_RunAlgoError: No ``benchmark_spec`` was provided, and ``zipline.api.set_benchmark`` was not called in ``initialize``.

How could we run the backtester without setting any benchmark ?.

Thanks,

.

richafrank commented 4 years ago

Thanks @playxback . It looks like the check is inverted as reported in https://github.com/quantopian/zipline/issues/2761! We'll fix that...

quantopian / zipline