nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester
https://nautilustrader.io
GNU Lesser General Public License v3.0
2.15k stars 485 forks source link

OrderBook L1/L2/L3 #199

Closed cjdsellers closed 3 years ago

cjdsellers commented 3 years ago

An early version of the OrderBook feature has been pushed to the develop branch.

The bids and asks in the order book are represented as lists of (Price, Quantity) tuples, in each case sorted from top to bottom. I feel this is a more usable representation than lists of [float, float] which is what's currently returned from CCXT? There is some overhead (~0.5μs / 500ns) to construct each object - which would probably be OK for most users?

So a user can subscribe to an order book by symbol. OrderBook object snapshots will then be passed into on_order_book from the DataEngine. Optionally a user can specify a timedelta interval if they only require periodic snapshots (also with an optional start_delay timedelta if they don't want those snapshots to occur at floored times i.e the very start of a minute or second).

Some things to consider is how to specify the level and depth required? Separate levels would then require separate socket streams for L2 and L3. I'm currently thinking depth will be the maximum any strategy has currently subscribed to for that symbol.

Requesting any comments on the feature.

cjdsellers commented 3 years ago

I've now refactored the OrderBook to represent bids and asks as lists of (Decimal, Decimal) which is more general and has less overhead.

It may be valuable to also represent them in the original [[float, float]]?

jpmediadev commented 3 years ago

Thank you! Great work.

It may be valuable to also represent them in the original [[float, float]]?

yes for me it will be valuable

I understand correctly that the CCXT order book formation algorithm is used?

cjdsellers commented 3 years ago

Thank you! Great work.

It may be valuable to also represent them in the original [[float, float]]?

yes for me it will be valuable

I understand correctly that the CCXT order book formation algorithm is used?

That's correct, at this stage it's all coming off CCXT so best bid and ask are at index 0.

We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.

Thoughts on this?

ian-wazowski commented 3 years ago

Great work! finally!! Thank you!

We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.

It seems to be very good.

cjdsellers commented 3 years ago

I'm just refactoring to get the specified level and depth working. Holding off on a 'raw' [float, float] order book option for the moment.

cjdsellers commented 3 years ago

Great work! finally!! Thank you!

We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.

It seems to be very good.

What's your use-case for the order book? Do you stream or look at interval snapshots?

ian-wazowski commented 3 years ago

Great work! finally!! Thank you!

We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.

It seems to be very good.

I have plans to develop two trading algorithms, one for directional trading and one for arbitrage-based market making.

I think both should have order book snapshot to calculate VPIN or order book imbalance.

There are very few exchanges support L3 orderbook streaming, so I will use streaming snapshots(L2) mainly.

and I think It is very unnecessary and expensive(L1/L2 cache miss, memcpy overhead) to regenerate the order book object every time order book data comes in. according to my research the way of like hummingbot did is it seems to be the best practice HFT teams do(Implement apply method reducing 20% ~ 15% overhead).

cjdsellers commented 3 years ago

For the snapshot intervals, would you require the start delay or have the intervals naturally floored to the number of seconds/minutes etc?

I agree the current implementation is not optimal, was a first pass. I'm concerned about passing a mutable OrderBook all over the system though. I'll look into the apply method reducing approach though.

cjdsellers commented 3 years ago

Pushed a refactoring. Interval snapshots now working, specified as an int of seconds. Open to suggestions on whether we need finer grain snapshots/intervals than 1 second?

The depth is now being passed to the _watch_order_book as a limit however it's not limiting the number of bid and asks for some reason - will get to the bottom of that next session.

Also kwargs are now possible for exchange specific options.

cjdsellers commented 3 years ago

Ok I'm thinking we definitely need to improve the efficiency of this OrderBook. I think @ian-wazowski is correct, the object shouldn't be regenerated on every change, just the diffs with this "apply method reducing" style. Then on stream or interval updates just the reference for that updated object will be passed.

So I'll probably add bids and asks methods which will return a vector[OrderBookEntry] which Python can interpret as a list. If there's a threading issue then we can always apply a lock under the hood however with the GIL and the cooperative multi-tasking of the event loop this probably won't be needed. I've written a really efficient lock in Cython before so possibly the added safety won't cause much overhead anyway - we'll test that later.

cjdsellers commented 3 years ago

Given we're pulling this off CCXT, I'll keep OrderBookEntry to double types for the price and quantity too. Then on OrderBook we can have bids_as_decimals and asks_as_decimals if that's how a user prefers them - which avoids creating all of those Decimal objects on every update.

cjdsellers commented 3 years ago

Ok, so another refactoring pass of the OrderBook about to be pushed to develop.

The object is just constructed once and thereafter updated.

Currently I can't see a way to only update the differences when using CCXT, however I think that's what's going on under the hood there anyway. Its coercing the bids and asks coming off CCXT _watch_order_book into memory views of double[:, :] via np.array.

This is apparently a zero copy coercion https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html?highlight=array#cpython-array-module.

The bids and asks can be accessed as either;

Timestamp is now just a long Unix time too.

Performance still needs to be measured however judging by the timestamps in the logs this is MUCH faster than the last version.

When we have our own optimized adapters which don't rely on CCXT Pro I think this can be improved further.

Any thoughts or suggestions, guys?

cjdsellers commented 3 years ago

Also the depth limit is working now.

jpmediadev commented 3 years ago

maybe it makes sense to look at such a solution - https://github.com/0b01/tectonicdb

optimised algos: https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book

cjdsellers commented 3 years ago

maybe it makes sense to look at such a solution - https://github.com/0b01/tectonicdb

optimised algos: https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book

Thanks I'll check these out!

cjdsellers commented 3 years ago

I'm looking into this more closely. The above are great links.

Right now the OrderBook simply isn't efficient or functional enough.

I'm leaning towards implementing something in C++ as Cython can bind to this directly, rather than going down the path of bringing Rust into the codebase at this stage - with all of the build tooling and binding that would entail (that's a longer term vision).

There's also the issue of implementing backtest support for the OrderBook too. My own intuition is to add it to the data stream of quote and trade ticks and process it all sorted on timestamps.

Thoughts?

cjdsellers commented 3 years ago

Given how critical the order book is I've decided to give it a go in Rust.

I pushed a work in progress to develop. Still alot of implementation to go and figuring out the C bindings for Cython.

If anyone has Rust experience feel free to get involved!

cjdsellers commented 3 years ago

So Rust is now integrated into the project.

Crates will be located in the project root lib folder.

cbindgen will automatically manage the FFI boundary by generating the necessary .h headers when crates are built for those struct definitions with #[repr(C)], and for functions like so;

#[no_mangle]
pub extern "C" fn

On the Python side its fairly easy to write the header definitions in a .pxd, and then wrap it with a standard .pyx. I think this has an advantage over some of the other methods people are using to bind with Python, as it keeps all calls to Rust in C land.

The CI pipeline now installs Rust, the build.py calls cargo build --release on all crates located in lib, with the output static libraries linked by Cython during the C extensions compilation.

The order book itself still needs some work on the algorithms and functionality. I was more exploring getting something basic setup and all the tooling in place.

jpmediadev commented 3 years ago

I find this interesting. downloaded a book by Rust...

cjdsellers commented 3 years ago

I recommend The Rust Programming Language by Steve Klabnik and Carol Nichols and also Programming Rust: Fast, Safe Systems Development by Jim Blandy and Jason Orendorff.

jpmediadev commented 3 years ago

Thanks, i start with Jim Blandy and Jason Orendorff

cjdsellers commented 3 years ago

What sort of functionality do we need on the OrderBook?

Right now we have;

@scoriiu

cjdsellers commented 3 years ago

The above was functionality on the Rust implemented OrderBook. We're currently back to a limited 'placeholder', which can only have snapshots applied.

Once integrations are available which can access deltas only - the OrderBook will begin to iterate with improved functionality and performance.

cjdsellers commented 3 years ago

Efficient order book diff methods are now available including add, update and delete for OrderBook specific Order objects.

Internal bid and ask Ladders are maintained with behavior dependent on order book level (1/2/3).

ian-wazowski commented 3 years ago

@cjdsellers @jpmediadev @limx0

I would like to suggest adding new field local_timestamp_ns in orderbook, TradeTick, QuoteTick.

The local_timestamp_ns means that the timestamp arrived at trading machine(your machine), the existing one(timestamp_ns) means that timestamp at the very moment of tick data created in exchange.

The purpose of this is to check and record network latency(end-to-end latency: local_timestamp_ns - timestamp_ns) between trading machine and exchange.

How about this ?

Any suggestions are welcomed.

limx0 commented 3 years ago

@ian-wazowski +1 from me (I spoke to @cjdsellers about a similar idea), although naming wise I would prefer something like exchange_timestamp_ns or remote_timestamp_ns (for other data services that provide a timestamp but are not an exchange) to refer to the exchange server - local_timestamp_ns reads as "my local timezone" to me.

Just need to ensure we have some flexibility on both timestamps - some people will purchase data from an exchange and will only have the exchange timestamp, others might be recording data and (may) only have a local timestamp. We should be careful to handle this in the system (which timestamps you are using) if it is added.

This would be a nice stepping stone into https://github.com/nautechsystems/nautilus_trader/issues/286

cjdsellers commented 3 years ago

I think this is a very useful and necessary feature, and reduces some timestamping ambiguity.

I propose all values of timestamp_ns correspond to when clock.timestamp_ns was called by the system (as is currently the case).

The Event types are OK, as they have both a timestamp_ns and then another descriptive timestamp of when the event occurred.

I propose that the additional timestamp needs to be on the Data base class, and we call it origin_timestamp_ns.

Thoughts?

cjdsellers commented 3 years ago

Moved this discussion to #288

Closing this in favor of opening separate OrderBook related issues.