Create state for live execution

tibkiss commented 7 years ago

In backtest and Quantopian real trading the algo's initialize method called only once: at the beginning of the execution. It is required that live trading does not diverge from backtest with regards of execution (i.e. initialize is called only once). It is also important to implement a robust live trading base which survives network outages & restarts.

Therefore state must be introduced to live-trading code path which is able to store the actual state ({{context}} variable, visible in initialize() & handle_data()) in disk. The logic to implement is simple:

During startup check if state file (pickle?) exists for the algo
- If not: call algo's initialize() and store context to state file
- Else: load context from state file
Save state file after every call to handle_data()

pbharrin commented 7 years ago

https://docs.python.org/3.4/library/pickle.html

pbharrin commented 7 years ago

I can take this, it is a simple task but it requires a number of classes to be added.

tibkiss commented 7 years ago

Thanks for taking this!

Besides algo's 'context' field we need to consider to include portfolio related items in this state as not all the required fields could be derived from IB's API:

zipline.protocol.Positions: last_sale_price and last_sale_date
zipline.protocol.Portfolio: starting_cash, start_date
The generic stats (perf result of TradingAlgorithm) might be also helpful to store & extend each day.

What do you think?

pbharrin commented 7 years ago

I have the logic for this working, however not everything can be serialized with Pickle. When I try to pickle the context I get the following error: TypeError: can't pickle LRU objects LRU refers to this external data structure.

pbharrin commented 7 years ago

TradingCalendar's _minute_to_session_label_cache attribute is of type LRU: https://github.com/zipline-live/zipline/blob/live/zipline/utils/calendars/trading_calendar.py#L110

tibkiss commented 7 years ago

@pbharrin : I would not pickle cache objects, logic should be there to (re-)load the missing values.

pbharrin commented 7 years ago

I have the first version of this working, just need a few details figured out.
I think this should be a command line option, so a user can specify where to store the file.
What should be the default behavior if no filename is specified? Not store the state? Store it in a default filename?

tibkiss commented 7 years ago

I'd opt for always storing the state to a file, whose filename is generated using application's daily the start time.

That way we will accumulate more files over time but it would help debugging features.

What do you think?

bartosh commented 7 years ago

Can we make this feature optional?

tibkiss commented 7 years ago

How do you imagine this being optional?

We cannot call initialize on every start (i.e. daily), because that's not how Q's live trading works.... Something I miss here?

bartosh commented 7 years ago

Frankly speaking I consider saving state as an unneeded complication. We already have state - it's broker account state (available cash, open orders, etc). Inventing another one will just complicate implementation. You seem to have another opinion, hence I decided to at least make this optional.

tibkiss commented 7 years ago

I don't think state (i.e. context variable) can be derived from broker data. What a broker provides is positions (in their representation) and account info, whereas context stores all kind of user specified variables. Even the simplest algorithms (buy_and_hold) makes use of context.

As I would like to run algorithms (which are running on Quantopian Backtest and Live) without any modification this is a necessary change.

mellertson commented 7 years ago

Hello all!

In regards to the "saving state" conversation, before using Zipline I used QuantConnect (a competitor). The servers were upgraded regularly, and when they were, the system shutdown any running algos. Depending on how the algos were programmed, some needed their "state" to be saved before shutting the algo down. Otherwise they needed to go through a "warm up" period before they could live trade again.

The particular algorithm I'm referring to, coded by myself and my colleague, used a Markov model to do pattern recognition. The Markov model needs to see a few days of data before it can begin to make accurate predictions, thus requiring its state to be saved.

QuantConnect didn't provide any local storage, so we weren't able to save its state. It created problems for the algo, so much so that we jumped ship to Zipline (and we're very happy we did!).

I'm just thinking out loud here, but in my opinion having the ability for an algo to save its state could be important, depending on how the algorithm is coded and what technology it uses. I have some code I already have well tested using the Peewee ORM library to serialize data to a MySQL database. Peewee can also serialize to local disk into a SQLite file.

If you think it would make the development effort easier, I'd be happy to contribute my Peewee wrapper library. Or maybe I could lend with some code to serialize to a pickle file, if that's the preferred method.

Either way, my intention isn't to make things more complicated. Please let me know how I can help.

rtntdeck commented 7 years ago

Given Quantopian's announcement yesterday regarding live trading, I came here. I'm glad there will be a mechanism for continuing to live trade.

This particular issue of state and the context variable(s) has become an area of great interest to me over the last week as I worked through some of Q's live paper trading nuances. Specifically the divergence between what is displayed as "Positions" and the actual orders/fills.

I agree that state should be kept and that the context variables are a place to do that. However, I also believe that @bartosh makes an interesting point about the Broker (whomever they are) being the source of 'truth' regarding certain components of state. Perhaps the correct optionallity is that cash, positions, and trades(order/fills) can(I would argue should) be optionally populated from the broker data at start of day.

tibkiss commented 7 years ago

@rtntdeck : Most of the broker provided info is already populated through context.portfolio: https://github.com/zipline-live/zipline/blob/d7f0cff4fa322a72cb4a80208585739d2d2d73a2/zipline/gens/brokers/ib_broker.py#L377-L443

Unfortunately you cannot load all the trades using IB API, only the active positions are presented.

rtntdeck commented 7 years ago

@tibkiss that is good news. Thank you for the follow up and contributions.

fredfortier commented 7 years ago

@tibkiss I'd like to at least float the idea of a logical data separation. While instantiating Portfolio from the broker data simplifies state management, it also binds the brokerage account to a single algorithm. This may be a better approach overall. However, it forces users to create an account partition per algorithm. I believe that this is hard to do with some brokers (e.g. RobinHood).

Here is a proposed logical breakdown for consideration:

Algorithm: Portfolio => positions, capital used, portfolio value, pnl, returns, start date
Broker: Account, Portfolio => starting cash, cash
Performance tracker: Orders, Transactions, performance metrics

Essentially, this allows multiple algorithms to run against the same brokerage account without mangling positions and performance metrics. Since cash and margin data still comes from the brokerage account, algorithms only spend what they can afford. Algorithms could also have a logical spending cap. Pickling up the portfolio and perf_tracker objects after handle_data() would be a crude way to preserve a full state.

I'm simply suggesting an alternative. I'm not sure which approach is best for most users.

tibkiss commented 7 years ago

@fredfortier : I appreciate the idea, but I have two concerns: 1) It deviates from Q's original API. At Q the only source of truth is your broker. Q does not support multiple algos running agains the same account. Something what we don't enforce at the moment. 2) If you persist your position state and you make manual adjustments on your position size will result inconsistency. Handling such cases would mean that you manually adjust your pickled state (which sounds very error-prone).

As this idea has little to do with the persistence of state variable I'd suggest to move this discussion out of this thread.

tibkiss commented 7 years ago

Quick update on this issue: I have extended @pbharrin 's state persistence branch a while ago and managed to load / store the context successfully by introducing black-list of variables which shall not be pickled.

Last challenge was to persist & load the scheduled functions. That's something which is difficult due to the complexity of the scheduled functions. My current workaround is to extend the API with a define_schedule() function which will be called every day. This approach is not the nicest (deviates from Q's API) but it works reasonably well.

Our friends at Quantopian hinted that it is worth-while to try to do the scheduling at before_trading_start() function, something I'll be experimenting with next week.

Once this problem is worked out I need to implement state matching to algo (if algo changes state loading shall not happen).

@pbharrin : As I have invested a reasonable amount of time here I'd be happy to take over this task from you and bring to completion. What do you think?

pbharrin commented 7 years ago

@tibkiss yes, please take this over.

fredfortier commented 7 years ago

I totally agree. I merely wanted to initiate the debate as I have a feeling that it will be a concern as adoption picks up.

IMHO - Concern #2 is more of a feature than a bug. I don't believe that users should be allowed to play god with their algorithm. Especially when it's done by accident. For example, I once made a cash deposit to my RobinHood which drastically skewed the performance statics of the associated algorithm. If they want to change the parameters of an algorithm, users should relaunch to keep performance statistic sane and accurate.

More on this in a separate thread when this issue enters the spotlight.

On Fri, Aug 25, 2017 at 10:20 AM Peter Harrington notifications@github.com wrote:

@tibkiss https://github.com/tibkiss yes, please take this over.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/zipline-live/zipline/issues/6#issuecomment-324934255, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZ-Qvp4AlPRdYBfjT5He07gK1C1E1ejks5sbtgsgaJpZM4Nb-RH .

tibkiss commented 7 years ago

First public version is here: https://github.com/zipline-live/zipline/pull/53

Reviews are welcome!

tibkiss commented 7 years ago

Delivered with: https://github.com/zipline-live/zipline/pull/53

zipline-live / zipline

Create state for live execution #6