scrtlabs / catalyst

An Algorithmic Trading Library for Crypto-Assets in Python
http://enigma.co
Apache License 2.0
2.49k stars 725 forks source link

Ingest Data on v0.3.1 and Data History #44

Closed avn3r closed 7 years ago

avn3r commented 7 years ago

Dear Catalyst Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

Description of Issue

  1. Ingest Data not being loaded for many markets. I have only focused on Poloniex minute data but here is a list of pairs that currently do not work with version 0.3.1

    • 'bcn_btc', 'burst_btc', 'dgb_btc', 'doge_btc', 'emc2_btc', 'pink_btc', 'sc_btc'
    • Out of a total of 69 pairs in Poloniex BTC market we have 7 pairs not working
  2. Data History is not accurate. data prices are not matching exchanges prices on that date at that time. This may be because data from the 3 exchanges are being now aggregated, but that's not the only issue. Data values are spiking/jumpy. Example prices increase from 1.4e-6 to 1.6e-6 in 30 minutes when the expected behavior (on exchange candle data) was that it will incrementally increase from 1.4e-6 to 1.43 to 1.5 to 1.55 to 1.6. These are just hypothetical numbers but I have seen this behavior in most currencies since the update.

These two errors did NOT exist on version 0.2.X

Here is how you can reproduce this issue on your machine:

Reproduction Steps

  1. Run the following code. lines 64-70 comment the markets that give errors. Uncomment them to filter them from the universe.

Code

import pandas as pd
from catalyst import run_algorithm
from catalyst.exchange.exchange_utils import get_exchange_symbols

from catalyst.api import (
    symbols,
)

def initialize(context):
    context.i = -1
    context.base_currency = 'btc'

def handle_data(context, data):
    lookback = 60 * 24 * 7  # (minutes, hours, days)
    context.i += 1
    if context.i < lookback:
        return

    today = context.blotter.current_dt.strftime('%Y-%m-%d %H:%M:%S')

    try:
        # update universe everyday
        new_day = 60 * 24
        if not context.i % new_day:
            context.universe = universe(context, today)

        # get data every 30 minutes
        minutes = 30
        if not context.i % minutes and context.universe:
            for coin in context.coins:
                pair = str(coin.symbol)

                # ohlcv data
                open = data.history(coin, 'open', lookback, '1m').ffill().bfill().resample('30T').first()
                high = data.history(coin, 'high', lookback, '1m').ffill().bfill().resample('30T').max()
                low = data.history(coin, 'low', lookback, '1m').ffill().bfill().resample('30T').min()
                close = data.history(coin, 'price', lookback, '1m').ffill().bfill().resample('30T').last()
                volume = data.history(coin, 'volume', lookback, '1m').ffill().bfill().resample('30T').sum()

                print(today, pair, close[-1])

    except Exception as e:
        print(e)

def analyze(context=None, results=None):
    pass

def universe(context, today):
    json_symbols = get_exchange_symbols('poloniex')
    poloniex_universe_df = pd.DataFrame.from_dict(json_symbols).transpose().astype(str)
    poloniex_universe_df['base_currency'] = poloniex_universe_df.apply(lambda row: row.symbol.split('_')[1],
                                                                       axis=1)
    poloniex_universe_df['market_currency'] = poloniex_universe_df.apply(lambda row: row.symbol.split('_')[0],
                                                                         axis=1)
    poloniex_universe_df = poloniex_universe_df[poloniex_universe_df['base_currency'] == context.base_currency]
    poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'gas_btc']

    # Markets currently not working on Catalyst 0.3.1
    # 2017-01-01
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'bcn_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'burst_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'dgb_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'doge_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'emc2_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'pink_btc']
    # poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.symbol != 'sc_btc']
    date = str(today).split(' ')[0]

    poloniex_universe_df = poloniex_universe_df[poloniex_universe_df.start_date < date]
    context.coins = symbols(*poloniex_universe_df.symbol)
    print(len(poloniex_universe_df))
    return poloniex_universe_df.symbol.tolist()

if __name__ == '__main__':
    start_date = pd.to_datetime('2017-01-01', utc=True)
    end_date = pd.to_datetime('2017-10-15', utc=True)

    performance = run_algorithm(start=start_date, end=end_date,
                                capital_base=10000.0,
                                initialize=initialize,
                                handle_data=handle_data,
                                analyze=analyze,
                                exchange_name='poloniex',
                                data_frequency='minute',
                                base_currency='btc',
                                live=False,
                                live_graph=False,
                                algo_namespace='test')

Error

[2017-10-24 19:06:46.909017] INFO: exchange_bundle: pricing data for [u'bcn_btc'] not found in range 2017-01-01 05:30:00+00:00 to 2017-01-08 05:30:00+00:00, updating the bundles.
    [====================================]  Fetching poloniex minute candles: :  100%
Pricing data open for trading pairs bcn_btc trading on exchange poloniex since 2014-05-20 00:00:00+00:00 is unavailable. The bundle data is either out-of-date or has not been loaded yet. Please ingest data using the command `catalyst ingest-exchange -x poloniex -f minute -i bcn_btc`. See catalyst documentation for details.

Since it crashes on bcn_btc pair all other pairs in that 30 minute cycle skipped, without printing ther values.

Sincerely, avn3r

lacabra commented 7 years ago

I believe both issues reported here have been resolved with commit 3d88d6a2c707adfb4e3791773e1ace942f62f8aa in release 0.3.3. Re-open if that's not the case.