performances - Githubissues

rhodan commented 8 years ago

I tried a little test only just buy and hold for a year.

0.8.3 is faster than any other frameworks on daily basis.

but sadly, 0.8.3 is ten times slower than quantopian's on minute frequency.

	zipline0.7.0	zipline0.8.3	Quantopian
daily	10.23sec	5.26sec	8sec
minute	57.12sec	244sec	25sec

ssanderson commented 8 years ago

@rhodan where are you running zipline that you're performing this comparison? Running two different algorithms with different data sources and on different machines is bound to have radically different performance. The Quantopian site is using the latest zipline under the hood, so I'm not sure there's much that's actionable about that comparision.

More interesting is the claim that zipline 0.8.3 is slower than 0.7.0. I suspect that performance there is still dominated by reads from whatever data source you're using, but I'd be interested to see profiling data. Also noteworthy is the fact that @jbredeche is currently working on a performance-oriented branch that overhauls Zipline's internal data loading mechanisms.

Closing this as an issue since I don't think there's anything concrete to be done here.

rhodan commented 8 years ago

I ran an same algo for those tests.

and for minutely test, I've got csv minute data with codes below

import datetime
import pytz
import pandas as pd
from zipline.finance.trading import TradingEnvironment

start = datetime.datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime.datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)

TE = TradingEnvironment()
df = pd.DataFrame(TE.minutes_for_days_in_range(start, end))
df.columns = ['Date']
df.index = pd.to_datetime(df.pop('Date'), utc=True)
df.index = df.index.tz_localize('UTC').tz_convert('US/Eastern')
df['Open'] = 100
df['High'] = 100
df['Low'] = 100
df['Close'] = 100
df['Volume'] = 100
df['Adj Close'] = 100

df.to_csv("c:\\us_min_random.csv")

then i've made data panel with the csv

import pandas as pd.
data1 = pd.read_csv("c:\\us_min_random.csv", index_col=['Date'], usecols=['Date', 'Open','High','Low','Close','Volume','Adj Close'])
data1.columns = ['open','high','low','close','volume','price']
data1.index = data1.index.to_datetime().tz_localize('UTC')
data1 = data1.dropna()

data = {'AAPL' : data1.to_dict(),
        'MSFT' : data1.to_dict()}

data = pd.Panel(data)

and I ran a simple 'buy and hold' algo for both 0.7.0 and 0.8.3 with timeit.

import time
import pytz
from datetime import datetime
from zipline.api import order, record, symbol
from zipline.algorithm import TradingAlgorithm
from zipline.utils.factory import load_bars_from_yahoo

starttime = time.time()
start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)

def initialize(context):
    context.count = 0

def handle_data(context, data):
    context.count += 1
    if context.count == 2:
        order(symbol('AAPL'), 20)
        order(symbol('MSFT'), 20)
    record(AAPL=data[symbol('AAPL')].price,MSFT=data[symbol('MSFT')].price)

from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
   start = start,
   end = end,
   data_frequency = "minute",
   emission_rate = "minute",
    )

algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, data_frequency='minute',
                        sim_params=sim_params, capital_base=500000)
perf = algo.run(data)

endtime = time.time()
print (endtime - starttime)

I know zipline is an open source framework and I am not in the position to criticize anything. I respect all the efforts, people here. I just want you to know that it is slow on minute frequency.

quantopian / zipline

performances #882