quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.28k stars 4.67k forks source link

performances #882

Closed rhodan closed 8 years ago

rhodan commented 8 years ago

I tried a little test only just buy and hold for a year.

0.8.3 is faster than any other frameworks on daily basis.

but sadly, 0.8.3 is ten times slower than quantopian's on minute frequency.

zipline0.7.0 zipline0.8.3 Quantopian
daily 10.23sec 5.26sec 8sec
minute 57.12sec 244sec 25sec
ssanderson commented 8 years ago

@rhodan where are you running zipline that you're performing this comparison? Running two different algorithms with different data sources and on different machines is bound to have radically different performance. The Quantopian site is using the latest zipline under the hood, so I'm not sure there's much that's actionable about that comparision.

More interesting is the claim that zipline 0.8.3 is slower than 0.7.0. I suspect that performance there is still dominated by reads from whatever data source you're using, but I'd be interested to see profiling data. Also noteworthy is the fact that @jbredeche is currently working on a performance-oriented branch that overhauls Zipline's internal data loading mechanisms.

Closing this as an issue since I don't think there's anything concrete to be done here.

rhodan commented 8 years ago

I ran an same algo for those tests.

and for minutely test, I've got csv minute data with codes below

import datetime
import pytz
import pandas as pd
from zipline.finance.trading import TradingEnvironment

start = datetime.datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime.datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)

TE = TradingEnvironment()
df = pd.DataFrame(TE.minutes_for_days_in_range(start, end))
df.columns = ['Date']
df.index = pd.to_datetime(df.pop('Date'), utc=True)
df.index = df.index.tz_localize('UTC').tz_convert('US/Eastern')
df['Open'] = 100
df['High'] = 100
df['Low'] = 100
df['Close'] = 100
df['Volume'] = 100
df['Adj Close'] = 100

df.to_csv("c:\\us_min_random.csv")

then i've made data panel with the csv

import pandas as pd.
data1 = pd.read_csv("c:\\us_min_random.csv", index_col=['Date'], usecols=['Date', 'Open','High','Low','Close','Volume','Adj Close'])
data1.columns = ['open','high','low','close','volume','price']
data1.index = data1.index.to_datetime().tz_localize('UTC')
data1 = data1.dropna()

data = {'AAPL' : data1.to_dict(),
        'MSFT' : data1.to_dict()}

data = pd.Panel(data)

and I ran a simple 'buy and hold' algo for both 0.7.0 and 0.8.3 with timeit.

import time
import pytz
from datetime import datetime
from zipline.api import order, record, symbol
from zipline.algorithm import TradingAlgorithm
from zipline.utils.factory import load_bars_from_yahoo

starttime = time.time()
start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)

def initialize(context):
    context.count = 0

def handle_data(context, data):
    context.count += 1
    if context.count == 2:
        order(symbol('AAPL'), 20)
        order(symbol('MSFT'), 20)
    record(AAPL=data[symbol('AAPL')].price,MSFT=data[symbol('MSFT')].price)

from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
   start = start,
   end = end,
   data_frequency = "minute",
   emission_rate = "minute",
    )

algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, data_frequency='minute',
                        sim_params=sim_params, capital_base=500000)
perf = algo.run(data)

endtime = time.time()
print (endtime - starttime)

I know zipline is an open source framework and I am not in the position to criticize anything. I respect all the efforts, people here. I just want you to know that it is slow on minute frequency.