Closed cjdsellers closed 3 years ago
I've now refactored the OrderBook
to represent bids and asks as lists of (Decimal, Decimal)
which is more general and has less overhead.
It may be valuable to also represent them in the original [[float, float]]
?
Thank you! Great work.
It may be valuable to also represent them in the original [[float, float]]?
yes for me it will be valuable
I understand correctly that the CCXT order book formation algorithm is used?
Thank you! Great work.
It may be valuable to also represent them in the original [[float, float]]?
yes for me it will be valuable
I understand correctly that the CCXT order book formation algorithm is used?
That's correct, at this stage it's all coming off CCXT so best bid and ask are at index 0.
We'll need to include OrderBook
data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data
base class with a timestamp and overridden comparison methods so sorting can occur correctly.
Thoughts on this?
Great work! finally!! Thank you!
We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.
It seems to be very good.
I'm just refactoring to get the specified level
and depth
working. Holding off on a 'raw' [float, float]
order book option for the moment.
Great work! finally!! Thank you!
We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.
It seems to be very good.
What's your use-case for the order book? Do you stream or look at interval snapshots?
Great work! finally!! Thank you!
We'll need to include OrderBook data for the backtests too. The most obvious way I can think of doing it is adding the snapshots to the tick stream which will really then become a data stream. These classes may need a Data base class with a timestamp and overridden comparison methods so sorting can occur correctly.
It seems to be very good.
I have plans to develop two trading algorithms, one for directional trading and one for arbitrage-based market making.
I think both should have order book snapshot to calculate VPIN or order book imbalance.
There are very few exchanges support L3 orderbook streaming, so I will use streaming snapshots(L2) mainly.
and I think It is very unnecessary and expensive(L1/L2 cache miss, memcpy overhead) to regenerate the order book object every time order book data comes in. according to my research the way of like hummingbot did is it seems to be the best practice HFT teams do(Implement apply method reducing 20% ~ 15% overhead).
For the snapshot intervals, would you require the start delay or have the intervals naturally floored to the number of seconds/minutes etc?
I agree the current implementation is not optimal, was a first pass. I'm concerned about passing a mutable OrderBook
all over the system though. I'll look into the apply method reducing approach though.
Pushed a refactoring. Interval snapshots now working, specified as an int
of seconds. Open to suggestions on whether we need finer grain snapshots/intervals than 1 second?
The depth
is now being passed to the _watch_order_book
as a limit
however it's not limiting the number of bid and asks for some reason - will get to the bottom of that next session.
Also kwargs
are now possible for exchange specific options.
Ok I'm thinking we definitely need to improve the efficiency of this OrderBook
. I think @ian-wazowski is correct, the object shouldn't be regenerated on every change, just the diffs with this "apply method reducing" style. Then on stream or interval updates just the reference for that updated object will be passed.
So I'll probably add bids
and asks
methods which will return a vector[OrderBookEntry]
which Python can interpret as a list
. If there's a threading issue then we can always apply a lock under the hood however with the GIL and the cooperative multi-tasking of the event loop this probably won't be needed. I've written a really efficient lock in Cython before so possibly the added safety won't cause much overhead anyway - we'll test that later.
Given we're pulling this off CCXT, I'll keep OrderBookEntry
to double
types for the price and quantity too. Then on OrderBook
we can have bids_as_decimals
and asks_as_decimals
if that's how a user prefers them - which avoids creating all of those Decimal
objects on every update.
Ok, so another refactoring pass of the OrderBook
about to be pushed to develop
.
The object is just constructed once and thereafter updated.
Currently I can't see a way to only update the differences when using CCXT, however I think that's what's going on under the hood there anyway. Its coercing the bids and asks coming off CCXT _watch_order_book
into memory views of double[:, :]
via np.array
.
This is apparently a zero copy coercion https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html?highlight=array#cpython-array-module.
The bids and asks can be accessed as either;
the raw memoryview double[:, :]
via order_book.bids_c()
, order_book.asks_c()
.
[[float, float]]
for Python-land via order_book.bids()
and order_book.asks()
.
[[decimal.Decimal
, decimal.Decimal
]] via order_book.bids_as_decimals()
and order_book.asks_as_decimals()
Timestamp is now just a long
Unix time too.
Performance still needs to be measured however judging by the timestamps in the logs this is MUCH faster than the last version.
When we have our own optimized adapters which don't rely on CCXT Pro I think this can be improved further.
Any thoughts or suggestions, guys?
Also the depth limit is working now.
maybe it makes sense to look at such a solution - https://github.com/0b01/tectonicdb
optimised algos: https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book
maybe it makes sense to look at such a solution - https://github.com/0b01/tectonicdb
optimised algos: https://quant.stackexchange.com/questions/3783/what-is-an-efficient-data-structure-to-model-order-book
Thanks I'll check these out!
I'm looking into this more closely. The above are great links.
Right now the OrderBook
simply isn't efficient or functional enough.
I'm leaning towards implementing something in C++ as Cython can bind to this directly, rather than going down the path of bringing Rust into the codebase at this stage - with all of the build tooling and binding that would entail (that's a longer term vision).
There's also the issue of implementing backtest support for the OrderBook
too. My own intuition is to add it to the data stream of quote and trade ticks and process it all sorted on timestamps.
Thoughts?
Given how critical the order book is I've decided to give it a go in Rust.
I pushed a work in progress to develop
. Still alot of implementation to go and figuring out the C bindings for Cython.
If anyone has Rust experience feel free to get involved!
So Rust is now integrated into the project.
Crates will be located in the project root lib
folder.
cbindgen
will automatically manage the FFI boundary by generating the necessary .h
headers when crates are built for those struct definitions with #[repr(C)]
, and for functions like so;
#[no_mangle]
pub extern "C" fn
On the Python side its fairly easy to write the header definitions in a .pxd
, and then wrap it with a standard .pyx
. I think this has an advantage over some of the other methods people are using to bind with Python, as it keeps all calls to Rust in C land.
The CI pipeline now installs Rust, the build.py
calls cargo build --release
on all crates located in lib
, with the output static libraries linked by Cython during the C extensions compilation.
The order book itself still needs some work on the algorithms and functionality. I was more exploring getting something basic setup and all the tooling in place.
I find this interesting. downloaded a book by Rust...
I recommend The Rust Programming Language by Steve Klabnik and Carol Nichols and also Programming Rust: Fast, Safe Systems Development by Jim Blandy and Jason Orendorff.
Thanks, i start with Jim Blandy and Jason Orendorff
What sort of functionality do we need on the OrderBook
?
Right now we have;
@scoriiu
The above was functionality on the Rust implemented OrderBook
. We're currently back to a limited 'placeholder', which can only have snapshots applied.
Once integrations are available which can access deltas only - the OrderBook
will begin to iterate with improved functionality and performance.
Efficient order book diff methods are now available including add
, update
and delete
for OrderBook
specific Order
objects.
Internal bid and ask Ladder
s are maintained with behavior dependent on order book level (1/2/3).
@cjdsellers @jpmediadev @limx0
I would like to suggest adding new field local_timestamp_ns
in orderbook, TradeTick, QuoteTick.
The local_timestamp_ns
means that the timestamp arrived at trading machine(your machine), the existing one(timestamp_ns
) means that timestamp at the very moment of tick data created in exchange.
The purpose of this is to check and record network latency(end-to-end latency: local_timestamp_ns
- timestamp_ns
) between trading machine and exchange.
How about this ?
Any suggestions are welcomed.
@ian-wazowski +1 from me (I spoke to @cjdsellers about a similar idea), although naming wise I would prefer something like exchange_timestamp_ns
or remote_timestamp_ns
(for other data services that provide a timestamp but are not an exchange) to refer to the exchange server - local_timestamp_ns
reads as "my local timezone" to me.
Just need to ensure we have some flexibility on both timestamps - some people will purchase data from an exchange and will only have the exchange timestamp, others might be recording data and (may) only have a local timestamp. We should be careful to handle this in the system (which timestamps you are using) if it is added.
This would be a nice stepping stone into https://github.com/nautechsystems/nautilus_trader/issues/286
I think this is a very useful and necessary feature, and reduces some timestamping ambiguity.
I propose all values of timestamp_ns
correspond to when clock.timestamp_ns
was called by the system (as is currently the case).
The Event
types are OK, as they have both a timestamp_ns
and then another descriptive timestamp of when the event occurred.
I propose that the additional timestamp needs to be on the Data
base class, and we call it origin_timestamp_ns
.
Thoughts?
Moved this discussion to #288
Closing this in favor of opening separate OrderBook
related issues.
An early version of the
OrderBook
feature has been pushed to thedevelop
branch.The bids and asks in the order book are represented as lists of
(Price, Quantity)
tuples, in each case sorted from top to bottom. I feel this is a more usable representation than lists of[float, float]
which is what's currently returned from CCXT? There is some overhead (~0.5μs / 500ns) to construct each object - which would probably be OK for most users?So a user can subscribe to an order book by symbol.
OrderBook
object snapshots will then be passed intoon_order_book
from theDataEngine
. Optionally a user can specify atimedelta
interval if they only require periodic snapshots (also with an optional start_delaytimedelta
if they don't want those snapshots to occur at floored times i.e the very start of a minute or second).Some things to consider is how to specify the level and depth required? Separate levels would then require separate socket streams for L2 and L3. I'm currently thinking depth will be the maximum any strategy has currently subscribed to for that symbol.
Requesting any comments on the feature.