scrtlabs / catalyst

An Algorithmic Trading Library for Crypto-Assets in Python
http://enigma.co
Apache License 2.0
2.49k stars 725 forks source link

Missing data when backtesting #294

Open Dan733 opened 6 years ago

Dan733 commented 6 years ago

Dear Catalyst Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

aiodns==1.1.1 aiohttp==3.0.1 alembic==0.9.9 async-timeout==2.0.1 attrdict==2.0.0 attrs==17.4.0 bcolz==0.12.1 bleach==2.1.3 boto3==1.6.18 botocore==1.9.18 Bottleneck==1.2.1 cchardet==2.1.1 ccxt==1.12.30 certifi==2018.1.18 chardet==3.0.4 click==6.7 colorama==0.3.9 contextlib2==0.5.5 cycler==0.10.0 cyordereddict==1.0.0 Cython==0.28.1 cytoolz==0.9.0.1 decorator==4.2.1 docutils==0.14 empyrical==0.2.2 enigma-catalyst==0.5.8 entrypoints==0.2.3 eth-abi==1.0.0 eth-account==0.1.0a2 eth-hash==0.1.1 eth-keyfile==0.5.1 eth-keys==0.2.0b3 eth-rlp==0.1.0 eth-utils==1.0.1 hexbytes==0.1.0 html5lib==1.0.1 idna==2.6 idna-ssl==1.0.1 intervaltree==2.1.0 ipykernel==4.8.2 ipython==6.2.1 ipython-genutils==0.2.0 ipywidgets==7.1.2 jedi==0.11.1 Jinja2==2.10 jmespath==0.9.3 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.2.3 jupyter-console==5.2.0 jupyter-core==4.4.0 kiwisolver==1.0.1 Logbook==1.3.0 lru-dict==1.1.6 lxml==4.2.1 Mako==1.0.7 MarkupSafe==1.0 matplotlib==2.2.2 mistune==0.8.3 multidict==4.1.0 multipledispatch==0.5.0 nbconvert==5.3.1 nbformat==4.4.0 networkx==2.1 notebook==5.4.1 numexpr==2.6.4 numpy==1.14.2 pandas==0.19.2 pandas-datareader==0.6.0 pandocfilters==1.4.2 parso==0.1.1 patsy==0.5.0 pickleshare==0.7.4 prompt-toolkit==1.0.15 pycares==2.3.0 pycryptodome==3.5.1 pyfolio==0.8.0 Pygments==2.2.0 pyparsing==2.2.0 python-dateutil==2.7.2 python-editor==1.0.3 pytz==2018.3 pywinpty==0.5.1 pyzmq==17.0.0 qtconsole==4.3.1 redo==1.6 requests==2.18.4 requests-file==1.4.3 requests-ftp==0.3.1 requests-toolbelt==0.8.0 rlp==0.6.0 s3transfer==0.1.13 scikit-learn==0.19.1 scipy==1.0.1 seaborn==0.8.1 Send2Trash==1.5.0 simplegeneric==0.8.1 six==1.11.0 sortedcontainers==1.5.9 SQLAlchemy==1.2.5 statsmodels==0.8.0 tables==3.4.2 terminado==0.8.1 testpath==0.3.1 toolz==0.9.0 tornado==5.0.1 traitlets==4.3.2 urllib3==1.22 wcwidth==0.1.7 web3==4.0.0b13 webencodings==0.5.1 widgetsnbextension==3.1.4 wrapt==1.10.11 yarl==1.1.0

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

I am encountering this error when running a backtest with a fresh ingestion of daily data from bitfinex. Specifically ingesting btc_eur does not solve the problem.

catalyst.exchange.exchange_errors.PricingDataNotLoadedError: Missing data for bitfinex btc_eur in date range [2017-05-19 00:00:00+00:00 - 2017-07-01 00:00:00+00:00] Please run: catalyst ingest-exchange -x bitfinex -f daily -i btc_eur. See catalyst documentation for details.

Here is how you can reproduce this issue on your machine:

Reproduction Steps

  1. ingest bitfinex daily data
  2. Run backtest on data ranging from 2017 - 2018
  3. If it matters, I am rebalancing every 30 days, using 1 day interval data ...

What steps have you taken to resolve this already?

ingesting a fresh bundle, specifically ingesting btc_eur

Anything

Sincerely, Daniel

lenak25 commented 6 years ago

Hi @Dan733 ,

I've tried to follow your reproduction steps but was not able to reproduce the issue. The ingestion procedure is incremental, meaning that it will not necessarily create a "fresh bundle". Could you please clean your data using catalyst clean-exchange command, reingest it and try again?

Thanks, Lena

Dan733 commented 6 years ago

Hi Lena,

I have cleaned the data using catalyst clean-exchange as well as manually deleting the bundles in .catalyst\data\exchanges\bitfinex. Re-running on my newest bundle, I receive the error

[2018-04-09 07:38:21.022950] WARNING: run_algo: Catalyst is currently in ALPHA. It is going through rapid development and it is subject to errors. Please use carefully. We encourage you to report any issue on GitHub: https://github.com/enigmampc/catalyst/issues [2018-04-09 07:38:21.022950] INFO: run_algo: Catalyst version 0.5.8 [2018-04-09 07:38:24.022997] INFO: run_algo: running algo in backtest mode [2018-04-09 07:38:26.991686] INFO: exchange_algorithm: initialized trading algorithm in backtest mode Traceback (most recent call last): File "jt-momentum-daily-bitfinex.py", line 261, in <module> algo_namespace='simple_universe') File "C:\catalyst-venv\lib\site-packages\catalyst\utils\run_algo.py", line 560, in run_algorithm stats_output=stats_output File "C:\catalyst-venv\lib\site-packages\catalyst\utils\run_algo.py", line 342, in _run overwrite_sim_params=False, File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 373, in run data, overwrite_sim_params File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 330, in run data, overwrite_sim_params File "C:\catalyst-venv\lib\site-packages\catalyst\algorithm.py", line 724, in run for perf in self.get_generator(): File "C:\catalyst-venv\lib\site-packages\catalyst\gens\tradesimulation.py", line 224, in transform for capital_change_packet in every_bar(dt): File "C:\catalyst-venv\lib\site-packages\catalyst\gens\tradesimulation.py", line 137, in every_bar handle_data(algo, current_data, dt_to_use) File "C:\catalyst-venv\lib\site-packages\catalyst\utils\events.py", line 216, in handle_data dt, File "C:\catalyst-venv\lib\site-packages\catalyst\utils\events.py", line 235, in handle_data self.callback(context, data) File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 351, in handle_data super(ExchangeTradingAlgorithmBacktest, self).handle_data(data) File "C:\catalyst-venv\lib\site-packages\catalyst\algorithm.py", line 473, in handle_data self._handle_data(self, data) File "jt-momentum-daily-bitfinex.py", line 89, in handle_data frequency='1D')).values File "catalyst\_protocol.pyx", line 120, in catalyst._protocol.check_parameters.__call__.assert_keywords_and_call File "catalyst\_protocol.pyx", line 646, in catalyst._protocol.BarData.history File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 96, in get_history_window ffill)) File "C:\catalyst-venv\lib\site-packages\redo\__init__.py", line 162, in retry return action(*args, **kwargs) File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 70, in _get_history_window ffill) File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 312, in get_exchange_history_window algo_end_dt=self._last_available_session, File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 898, in get_history_window_series_and_load data_frequency=data_frequency, File "C:\catalyst-venv\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 1006, in get_history_window_series end_dt=end_dt catalyst.exchange.exchange_errors.PricingDataNotLoadedError: Missing data for bitfinex btc_eur in date range [2017-05-19 00:00:00+00:00 - 2017-07-01 00:00:00+00:00] Please run:catalyst ingest-exchange -x bitfinex -f daily -i btc_eur. See catalyst documentation for details.

lenak25 commented 6 years ago

Thanks for updating @Dan733. The trading of btc_eur has started on Bitfinex at November 22nd 2017 so you should not expect information for this pair earlier.

Dan733 commented 6 years ago

Thanks Lena,

In my code, I am using the example universe selector which is given in the example algorithm, 'Simple Universe', which is supposed to filter out pairs that have not started trading yet. Given here:

def universe(context, lookback_date, current_date):
# get all the pairs for the given exchange
json_symbols = get_exchange_symbols(context.exchange)
# convert into a DataFrame for easier processing
df = pd.DataFrame.from_dict(json_symbols).transpose().astype(str)
df['base_currency'] = df.apply(lambda row: row.symbol.split('_')[1],
                               axis=1)
df['market_currency'] = df.apply(lambda row: row.symbol.split('_')[0],
                                 axis=1)

# Filter all the pairs to get only the ones for a given base_currency
df = df[df['base_currency'] == context.base_currency]

# Filter all pairs to ensure that pair existed in the current date range
df = df[df.start_date < lookback_date]
df = df[df.end_daily >= current_date]

context.coins = symbols(*df.symbol)  # convert all the pairs to symbols
return df.symbol.tolist()

However, this is not catching 'btc_eur' despite its November start date. If I print df['start_date'] for btc_eur I get this result:

Name: start_date, dtype: object
btceur    2017-05-19 00:00:00+00:00

Perhaps btc_eur is being given the wrong start date when packaged in the bundle?

lenak25 commented 6 years ago

You are absolutely correct, we will check this. Thanks for reporting.

Dan733 commented 6 years ago

Thanks Lena,

Though I don't have a list handy, it's worth noting that I think a number of bitfinex pairs might have this issue, as I've been running into this exception with multiple pairs over different periods.

lenak25 commented 6 years ago

ok, thanks, if you'll have more information on the matter, please share.

TheHooper commented 6 years ago

I got the same bug

TheHooper commented 6 years ago

remove code : data.current(context.asset, 'last_traded') and all things right. So the key is this function.