zipline-live / zipline

Zipline-Live, a Pythonic Algorithmic Trading Library
http://www.zipline-live.io/
Apache License 2.0
397 stars 65 forks source link

Enable Pipeline in live trading (daily) #40

Open pbharrin opened 7 years ago

pbharrin commented 7 years ago

Pipeline is one of the key components to Zipline, allowing calculations to be done over thousands of equities. (The current universe of North American equities available on Quantopian is over 8000.) However there is not a yet a working version of Pipeline in zipline-live.

The initial testing of zipline-live is done with Interactive Brokers as the broker. Receiving live updates of 8k+ tickers from the IB would be prohibitively expensive, the default max market data lines from IB is 100.

There is a straight-forward path to enabling Pipeline use with Interactive Brokers for daily data and that is through Zipline bundles. Each day of trading would require the following steps:

  1. historical data is loaded from IB's historical data into a Zipline bundle
  2. trading decisions are made based off of the data in the bundle and orders are routed to IB

Users are given the freedom and flexibility to control what exchanges and tickers they want included in the universe. This brings up the question: does the user need to know all the tickers for each exchange a priori? Wouldn't hard-coding the tickers introduce survivorship bias? To properly address these questions the data loader can first download a list of all the available tickers in an exchange, then get historical data for that exchange/ticker from IB. In this scheme the user only needs to supply a list of exchanges they wish to trade.

The above scheme naturally separates the following concerns:

If a user wanted a handful of known tickers that could be hardcoded or read from a file. If a user wanted to use another broker, Populate_Bundle_With_Historical_Data would simply need to be replaced. Populate_Bundle_With_Historical_Data probably should be an abstract class with the IB implementation in: Populate_Bundle_With_Historical_Data_From_IB or something along those lines.

pbharrin commented 7 years ago

A solid source of ticker data is (list of tickers for a given exchange) is: Eoddata.com.

fredfortier commented 7 years ago

If you are trading equities, why do you feel the need for evaluating exchange specific data in Pipeline? Are you interested in price differences between exchanges for a given security? Or, are you trying to include equities not currently in the quantopian-quandl bundle?

pbharrin commented 7 years ago

I am not looking to do anything across exchanges, the reason for pulling a listing of the tickers on a given exchange would be to ensure that the data (new symbols and delisting) stays up to date. I would like to trade other equities that are not in the quantopian-quandl bundle such as the LSE.

Your question raises a good point: do we need to do ANY coding or could we just take the quantopian-quandl bundle and trade off of that?

fredfortier commented 7 years ago

From my perspective as a trader, I'm fine with using the quantopian-quandl bundle to narrow down my universe. Before initiating a trade, I use the DataPortal methods (presumably routed to IB in zipline-live) to obtain the most up-to-date data about the targeted securities.

Consequently, insofar as I'm comfortable with the number of equities included in quantopian-quandl, Pipeline should just work in live trading. The situation would be different if securities were somehow priced differently between bundle and broker. I don't believe that this condition will ever apply to equity markets.

I find this separation of concerns useful in ensuring that the universe stays consistent between live and backtesting modes of execution. I would be worried about possible discrepancies in obtaining Pipeline data from different sources according to the mode of execution. The bundles would have to contain an identical universe which might be hard to achieve.

If I wanted to expand the universe, I could create a custom bundle using the existing machinery in the Pipeline module. For example, one could create some kind of ib-bundle. As you pointed out, the IB api has data limits. One may be able to circumvent this by fetching data in chunks. Or, multiple users could contribute to a single bundle.

It is my understanding that Pipeline can only load one bundle per algorithm, so one would have to take deliberate steps to merge data into the existing quantopian-quandl universe. From this perspective, perhaps creating a new bundle is easier. That said, this problem does not directly pertain to the live mode of execution.