scrtlabs / catalyst

An Algorithmic Trading Library for Crypto-Assets in Python
http://enigma.co
Apache License 2.0
2.48k stars 723 forks source link

Steady increase in CPU and memory use #310

Open sam31415 opened 6 years ago

sam31415 commented 6 years ago

Hi,

Sorry for the issue spamming... Check this issue for my configuration.

I have noticed that my algorithms tend to increase their CPU and memory use little by little until they crash the virtual instance they are running on. I decided to make a little test with a very simple algorithm, included below. I just let it run on a google cloud f1-micro virtual instance (the smallest one). The CPU use is plotted below. test You see that it starts with a very modest use, and then steadily increases, with occasional drops. In the end, the algorithm crashed with a request timeout from ccxt, which I believe is unrelated to the present issue.

Did anybody notice a similar behavior? This is an issue because the algorithms I trade display the same behavior and crash once they've eaten up all the memory. The memory use increase with the simple algorithm below seems much flatter, however. My current solution is to restart them once per day, which is fine, but not ideal.

I include the algorithm below. I think it does nothing but very typical tasks. It selects a universe by volume, then fetches some price data, and performs some computations on it. (It doesn't order anything.) I see no reason why such an algorithm would lead to a steady increase in CPU load.

Best,

Samuel

Algorithm:

import os
import catalyst as ctl
import numpy as np
import pandas as pd
import time
import tempfile
from scipy.stats import moment
from statsmodels.nonparametric.smoothers_lowess import lowess
import matplotlib.pyplot as plt
pd.set_option('chained_assignment','raise')
from catalyst.api import (
    symbols, 
    order_target_percent,
    order_target_value,
    order_target,
    order,
    symbol,
    record,
    cancel_order,
    get_open_orders,
    set_long_only,
    set_max_leverage,
    set_slippage
)
from logbook import Logger
from datetime import timedelta, datetime
strptime = datetime.strptime
strftime = datetime.strftime
import pytz

from catalyst.finance.slippage import VolumeShareSlippage, FixedSlippage
from catalyst.exchange.utils.exchange_utils import get_exchange_symbols
from catalyst.utils.paths import ensure_directory

from catalyst import run_algorithm
from copy import copy

# Parameters
LIVE = True#False#
PAPERTRADE = True#False#
RUN_LOCALLY = False#True#
if RUN_LOCALLY:
    PAPERTRADE = True
NAMESPACE = os.path.basename(__file__)[:-3]#'CryptoTrendIII-Polo-dev'

MIN_VOLUME = 300 # Volume filter 

log = Logger(NAMESPACE)

def initialize(context):
    c = context
    c.i = -1  # minute counter
    c.i_prev = -1 # The previous counter, to make sure we don't miss trades.
    c.exchange = 'poloniex'
    c.base_currency = 'btc'
    set_long_only()
    set_max_leverage(1.0)
    c.coins = []

    c.set_commission(maker=0.0015, taker=0.0025)
    c.set_slippage(spread=0.007)

    # Variable to detect the first run of the algo
    c.first_run = True

def handle_data(context, data):
    c = context
    one_day_in_minutes = 1440  # 60 * 24 assumes data_frequency='minute'
    c.i_prev = copy(c.i)
    # Determines the minute counter from the actual time, to trade always at the same time.
    c.i = int((data.current_dt - datetime.strptime('2018-03-01 00:00', 
                                               '%Y-%m-%d %H:%M').replace(tzinfo=pytz.UTC)).total_seconds() // 60)
    c.j = c.i//10

    # update universe everyday at 00hUTC.
    if (c.i // one_day_in_minutes) != (c.i_prev // one_day_in_minutes) or c.first_run:
        c.first_run = False

        # Fetches and prints the universe of tradeable pairs
        c.coins = get_universe(context, data, MIN_VOLUME)

        if c.coins:
            # Computes the volatility of the pairs, for position sizing
            price_data = data.history(c.coins, 'price', bar_count= 900, frequency="5T")
            # Log returns
            log_ret = np.log(np.divide(price_data, price_data.shift()))
            # Computes the volatility of the pairs
            c.std_pairs = log_ret.std(axis = 0)

    # If we reached 10 minutes and the universe is not empty
    if (c.i // 10) != (c.i_prev // 10) and c.coins and not c.first_run:

        # Price data
        price_data = data.history(c.coins, 'price', bar_count= 300, frequency="5T")
        # Log returns
        log_ret = np.log(np.divide(price_data, 
                                   price_data.shift()))
        # Coin weights, inverse volatility
        coin_weights = np.divide(1, c.std_pairs)

    record(
        cash = context.portfolio.cash,
        leverage = context.account.leverage,
        rel_PL = context.portfolio.portfolio_value/context.portfolio.starting_cash - 1,
        prices = data.current(c.coins, 'price'),
        volume = data.current(c.coins, 'volume'),
    )

def analyze(context=None, results=None):   

    # Plot the portfolio and asset data.
    ax1 = plt.subplot(211)
    np.log(results.portfolio_value).plot(ax=ax1)
    ax1.set_ylabel('Log portfolio value')

    ax3 = plt.subplot(212, sharex=ax1)
    results[['leverage', 'alpha', 'beta']].plot(ax=ax3)
    ax3.set_ylabel('Leverage ')

    plt.legend(loc=3)

    # Show the plot.
    plt.gcf().set_size_inches(18, 15)
    plt.tight_layout()
    plt.show()

def get_universe(context, data, min_volume):
    # Sets the current universe of trading pairs
    c = context
    lookback_days = 15  # days
    lookback_volume_days = 3

    # current date & time in each iteration formatted into a string
    now = data.current_dt
    current_date, time = now.strftime('%Y-%m-%d %H:%M:%S').split(' ')
    lookback_date = now - timedelta(days=lookback_days)
    # keep only the date as a string, discard the time
    lookback_date = lookback_date.strftime('%Y-%m-%d %H:%M:%S').split(' ')[0]

    # get all the pairs for the given exchange
    json_symbols = get_exchange_symbols(c.exchange)
    # convert into a DataFrame for easier processing
    df = pd.DataFrame.from_dict(json_symbols).transpose().astype(str)
    df['base_currency'] = df.apply(lambda row: row.symbol.split('_')[1],axis=1)
    df['market_currency'] = df.apply(lambda row: row.symbol.split('_')[0],axis=1)

    # Filter all the pairs to get only the ones for a given base_currency
    df = df[df['base_currency'] == c.base_currency]

    # Filter all the pairs to ensure that pair existed in the current date range
    df = df[df.start_date < lookback_date]

    # Convert all the pairs to symbols
    coins = symbols(*df.symbol)

    # Fetches volume data
    price_volume_data = data.history(coins, ['price','volume'], bar_count= lookback_volume_days, frequency="1d")
    # Average volume
    volu_coins = np.multiply(price_volume_data.volume, 
                                price_volume_data.price).mean(axis = 0)
    # Selects the coins with enough volume
    coins_new = set(volu_coins[volu_coins > 1.5*min_volume].index)
    coins_keep = set(volu_coins[volu_coins > min_volume].index).intersection(set(c.coins))

    coins = list(coins_new.union(coins_keep))
    return coins

if __name__ == '__main__':

    if LIVE:
        if PAPERTRADE:
            alg_name = NAMESPACE + '-paper'
        else:
            alg_name = NAMESPACE + '-live'
    else:
        alg_name = NAMESPACE + '-test'

    if LIVE:
        run_algorithm(
            capital_base = 0.03,
            initialize = initialize,
            handle_data = handle_data,
            analyze = analyze,
            exchange_name = 'poloniex',
            live = True,
            algo_namespace = alg_name,
            base_currency = 'btc',
            live_graph = False,
            simulate_orders = PAPERTRADE,
            #stats_output = NAMESPACE + '-st_output.txt',
            #output = NAMESPACE + '-output.txt'
        )

    else:
        folder = os.path.join(
            tempfile.gettempdir(), 'catalyst', NAMESPACE
        )
        ensure_directory(folder)

        timestr = time.strftime('%Y%m%d-%H%M%S')
        out = os.path.join(folder, '{}.p'.format(timestr))

        run_algorithm(
            capital_base = 0.1,
            data_frequency = 'minute',
            initialize = initialize,
            handle_data = handle_data,
            analyze = analyze,
            exchange_name = 'poloniex',
            algo_namespace = alg_name,
            base_currency = 'btc',
            start = pd.to_datetime('2017-10-18', utc=True),
            end = pd.to_datetime('2017-10-19', utc=True),
            output = out 
        )
        log.info('saved perf stats: {}'.format(out))
lenak25 commented 6 years ago

@sam31415, thanks for reporting this and opening an issue here as well. Adding a reference to the forum report. Have you been experiencing this phenomena when running not only on the entire universe?

sam31415 commented 6 years ago

Hi What do you mean by "running not only on the entire universe"? I originally observed this phenomenon with the algorithm I trade, and managed to reproduce it with the algorithm above, although the growth is slower with the latter. The original algorithm does a bit more computations, and obviously some ordering logic. I didn't make more extensive tests.

lenak25 commented 6 years ago

I meant the number of trading pairs used. Your algorithm above seems to run on all the exchange pairs (=entire universe), so I was wondering whether this behavior was observed using a small number of pairs.

sam31415 commented 6 years ago

Well, the function get_universe considers all the pairs on the exchange, and returns the ones with sufficiently high volume. Only those are then used in the later computations. I did not try with a small number of pairs, but it would be easy to modify the algorithm above to try.

Anyway I don't see how using a large universe would lead to the observed behavior: the function get_universe is run once per day. Then the algorithm only works with something like 10 pairs.

sam31415 commented 6 years ago

Just to give an update on this issue, using a large enough instance (for instance a standard one for 1-3 algos) avoids the crashing problem. The CPU and memory use has the same kind of seesaw shape over two time scales, but stays constant over sufficiently large time periods.