Closed rhodan closed 8 years ago
@rhodan where are you running zipline that you're performing this comparison? Running two different algorithms with different data sources and on different machines is bound to have radically different performance. The Quantopian site is using the latest zipline under the hood, so I'm not sure there's much that's actionable about that comparision.
More interesting is the claim that zipline 0.8.3 is slower than 0.7.0. I suspect that performance there is still dominated by reads from whatever data source you're using, but I'd be interested to see profiling data. Also noteworthy is the fact that @jbredeche is currently working on a performance-oriented branch that overhauls Zipline's internal data loading mechanisms.
Closing this as an issue since I don't think there's anything concrete to be done here.
I ran an same algo for those tests.
and for minutely test, I've got csv minute data with codes below
import datetime
import pytz
import pandas as pd
from zipline.finance.trading import TradingEnvironment
start = datetime.datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime.datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)
TE = TradingEnvironment()
df = pd.DataFrame(TE.minutes_for_days_in_range(start, end))
df.columns = ['Date']
df.index = pd.to_datetime(df.pop('Date'), utc=True)
df.index = df.index.tz_localize('UTC').tz_convert('US/Eastern')
df['Open'] = 100
df['High'] = 100
df['Low'] = 100
df['Close'] = 100
df['Volume'] = 100
df['Adj Close'] = 100
df.to_csv("c:\\us_min_random.csv")
then i've made data panel with the csv
import pandas as pd.
data1 = pd.read_csv("c:\\us_min_random.csv", index_col=['Date'], usecols=['Date', 'Open','High','Low','Close','Volume','Adj Close'])
data1.columns = ['open','high','low','close','volume','price']
data1.index = data1.index.to_datetime().tz_localize('UTC')
data1 = data1.dropna()
data = {'AAPL' : data1.to_dict(),
'MSFT' : data1.to_dict()}
data = pd.Panel(data)
and I ran a simple 'buy and hold' algo for both 0.7.0 and 0.8.3 with timeit.
import time
import pytz
from datetime import datetime
from zipline.api import order, record, symbol
from zipline.algorithm import TradingAlgorithm
from zipline.utils.factory import load_bars_from_yahoo
starttime = time.time()
start = datetime(2015, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2015, 11, 5, 0, 0, 0, 0, pytz.utc)
def initialize(context):
context.count = 0
def handle_data(context, data):
context.count += 1
if context.count == 2:
order(symbol('AAPL'), 20)
order(symbol('MSFT'), 20)
record(AAPL=data[symbol('AAPL')].price,MSFT=data[symbol('MSFT')].price)
from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
start = start,
end = end,
data_frequency = "minute",
emission_rate = "minute",
)
algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data, data_frequency='minute',
sim_params=sim_params, capital_base=500000)
perf = algo.run(data)
endtime = time.time()
print (endtime - starttime)
I know zipline is an open source framework and I am not in the position to criticize anything. I respect all the efforts, people here. I just want you to know that it is slow on minute frequency.
I tried a little test only just buy and hold for a year.
0.8.3 is faster than any other frameworks on daily basis.
but sadly, 0.8.3 is ten times slower than quantopian's on minute frequency.