nkaz001 / hftbacktest

A high-frequency trading and market-making backtesting and trading bot in Python and Rust, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and order books, with real-world crypto market-making examples for Binance Futures
MIT License
1.96k stars 384 forks source link

Add an argument for snapshot buffer #22

Closed richwomanbtc closed 1 year ago

richwomanbtc commented 1 year ago

Problem

The buffer size for the orderbook snapshot in tardis.py was hard-coded and too small for handling recent data. This issue was discovered while processing the data of 2023/06/01. Here's the code snippet that reproduces the issue:

import numpy as np
from hftbacktest.data.utils import tardis

data = tardis.convert(['BTCUSDT_trades_20230601.csv.gz', 'BTCUSDT_book_20230601.csv.gz'])
np.savez('btcusdt_20230601.npz', data=data)

This code leads to

    107         ss_bid_rn += 1
    108     else:
--> 109         ss_ask[ss_ask_rn] = [
    110             4,
    111             int(cols[2]),
    112             int(cols[3]),
    113             -1,
    114             float(cols[6]),
    115             float(cols[7])
    116         ]
    117         ss_ask_rn += 1
    118 else:

IndexError: index 1000 is out of bounds for axis 0 with size 1000

Solution

This PR introduces an additional parameter ss_buffer_size to tardis.convert function to allow users to set a custom buffer size. The change allows the function to handle a larger data set. The modified code snippet is:

data = tardis.convert(['BTCUSDT_trades_20230601.csv.gz', 'BTCUSDT_book_20230601.csv.gz'], buffer_size=1_000_000_000, ss_buffer_size=2000)
np.savez('btcusdt_20230601.npz', data=data)

Changes

Added a new parameter ss_buffer_size to the function tardis.convert.

Testing

The code has been tested with recent larger data sets and confirmed to work as expected.

nkaz001 commented 1 year ago

Thank you for your PR! Were you able to identify the issue with the Binance Futures data on Tardis.dev? AFAIK, the snapshot of Binance Futures only has a depth of 1000. https://binance-docs.github.io/apidocs/futures/en/#order-book

Could you please verify how many levels are present in the snapshot data?

richwomanbtc commented 1 year ago

It seems that the data of tardis.dev is not something we expected.

I create the dataframe from the npz file created by the modified tardis.covert.

btcusdt = np.load("notebook/btcusdt_20230601.npz")["data"]
ss = create_last_snapshot(btcusdt, tick_size=0.1, lot_size=0.001)
df = pd.DataFrame(ss, columns=["event", "exch_timestamp", "local_timestamp", "side", "price", "qty"])
bid_df = df[df["side"] == 1]
bid_df.nunique()

The result is

event                  1
exch_timestamp         1
local_timestamp        1
side                   1
price              25382
qty                 2927
dtype: int64

The snapshot contains 25382 ticks.

Also, bid_df is like this.

    event   exch_timestamp  local_timestamp side    price   qty
0   4.0 1.685664e+15    -1.0    1.0 26805.2 24.294
1   4.0 1.685664e+15    -1.0    1.0 26805.1 2.142
2   4.0 1.685664e+15    -1.0    1.0 26805.0 0.102
3   4.0 1.685664e+15    -1.0    1.0 26804.9 5.121
4   4.0 1.685664e+15    -1.0    1.0 26804.8 1.726
... ... ... ... ... ... ...
25377   4.0 1.685664e+15    -1.0    1.0 600.0   1.730
25378   4.0 1.685664e+15    -1.0    1.0 560.0   63.310
25379   4.0 1.685664e+15    -1.0    1.0 557.0   7.017
25380   4.0 1.685664e+15    -1.0    1.0 556.9   0.211
25381   4.0 1.685664e+15    -1.0    1.0 556.8   0.575

This contains too small prices. I don't understand the implementation, but it seems the incremental l2 has no snapshots and larger depth data than rest API.

Also, there is data from exchanges other than binance, so the hard coding of the depth may cause some nasty problems anyway.

nkaz001 commented 1 year ago

Thank you for looking into this. It appears that they are restoring the complete order book snapshot from another source, possibly a redundant server.