nkaz001 / hftbacktest

A high-frequency trading and market-making backtesting tool in Python and Rust, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and order books, with real-world crypto market-making examples for Binance Futures
MIT License
1.78k stars 357 forks source link

Read event data in parallel to backtest #124

Open garyttierney opened 4 weeks ago

garyttierney commented 4 weeks ago

Remove the read_data() calls within the backtest implementations and replace them with recv() calls on a lock-free queue. This avoids the pause that happened previously when a backtest reaches the end of the current periods data and begins loading the next file. With this model, the data for the next period should be available by the time the previous one finishes.

There are still a couple of improvements that need to be made here:

There are also a few peculiarities in the implementation like having peek so initialize_data can be trivially implemented, I'd like to see about restructuring this.

Remaining todo items

nkaz001 commented 4 weeks ago

I conducted a quick test and obtained the following results:

I have some concerns about whether this is the right direction. The approach introduces an additional copy. Moreover, the original daily loading method was designed to handle multiple days of data within limited memory by loading data one by one. With the new suggestion, there is a risk of exceeding memory capacity if data consumption doesn't keep pace with how quickly it is enqueued into the bus.

nkaz001 commented 4 weeks ago

Without the strategy implementation, the test uses only elapse(100ms). I will include a test with more intensive data, such as BTCUSDT.

garyttierney commented 4 weeks ago

I have some concerns about whether this is the right direction. The approach introduces an additional copy.

That is simply a limitation of the bus API in use. Replacing this with a ring-buffer is on the todo list above and gets the readers back to zero-copy and good cache coherence.

With the new suggestion, there is a risk of exceeding memory capacity if data consumption doesn't keep pace with how quickly it is enqueued into the bus.

The queue is a fixed size, so there's no risk of exceeding memory capacity. Although it should be loading incrementally by copying chunks of Events out of the file at a time, also on the todo list.

Without the strategy implementation, the test uses only elapse(100ms). I will include a test with more intensive data, such as BTCUSDT.

Can you share this test? It'd be useful to put in a benchmark as I iterate.

nkaz001 commented 4 weeks ago

Even though the queue implementation is lock-free, doesn't introducing an atomic value to check items in the producer/consumer potentially trigger cache invalidation, adding another layer of overhead?

I used the Rust version of the grid trading backtest example. It would be beneficial to have two benchmarks: one with and one without the strategy implementation. Using the BTCUSDT data from the here provided to ensure the benchmarks are aligned.

garyttierney commented 4 weeks ago

Working on replacing the bus with a ring buffer that eliminates the copies now. I think we can get away with very little ato mic usage on x64, references: