joytrust0609 commented 2 months ago

hi @nkaz001 , i am new to this framwork, and plan to move my backtest to this framework.

Can you give some examples which import L2 snapshot data and do the rest backtest? I have read and checked all the sample here. and i find DiffOrderBookSnapshot might be relevance. but i am not able to find the examples.
some advices: the workflow is tide and clean, but for new beginers, a skeleton of this framework might be useful.

best, justin

nkaz001 commented 2 months ago

Which exchange are you targeting? If the exchange provides incremental L2 market depth, it should be used. If not, could you provide sample data or a link to the page where I can download the sample data? This will allow me to make an example and run tests.

joytrust0609 commented 2 months ago

Which exchange are you targeting? If the exchange provides incremental L2 market depth, it should be used. If not, could you provide sample data or a link to the page where I can download the sample data? This will allow me to make an example and run tests.

here is a sample. ticksize is 5.0, multiplier is also 5. fee is 3.0 dollars(fixed and no rebate) per trade.

401002410_20240909.zip

nkaz001 commented 2 months ago

The following snippet may help you. Please ensure the conversion result is correct. Additionally, the data appears to be aggregated, as the change in turnover and trade quantity do not align. Furthermore, no trade side is provided, and there seems to be an issue with the timestamps. If you are using this data, the backtesting results could significantly deviate from live trading. You should carefully verify the backtesting result.

import polars as pl
from hftbacktest.data.utils.difforderbooksnapshot import (
    DiffOrderBookSnapshot,
    UNCHANGED,
    CHANGED,
    INSERTED,
    IN_THE_BOOK_DELETION,
    OUT_OF_BOOK_DELETION_BELOW,
    OUT_OF_BOOK_DELETION_ABOVE
)
import numpy as np

df = pl.read_csv('401002410_20240909.csv')
df.cast({'RecvTime': pl.Int64, 'ExchTime': pl.Int64})

diff = DiffOrderBookSnapshot(5, 5, 1)

for row in df.iter_rows():
    local_ts = row[1]
    exch_ts = row[2]

    trade_px = row[3]
    trade_qty = row[4]

    print(f'ev: INITIATOR_SIDE? | TRADE_EVENT, px: {row[3]}, qty: {row[4]}')

    bid_px = np.asarray([row[7], row[11], row[15], row[19], row[23]])
    ask_px = np.asarray([row[8], row[12], row[16], row[20], row[24]])
    bid_qty = np.asarray([row[9], row[13], row[17], row[21], row[25]])
    ask_qty = np.asarray([row[10], row[14], row[18], row[22], row[26]])

    bid, ask, bid_del, ask_del = diff.snapshot(bid_px, bid_qty, ask_px, ask_qty)

    for entry in bid:
        if entry[2] == INSERTED or entry[2] == CHANGED:
            print(f'ev: BUY_EVENT | DEPTH_EVENT, px: {entry[0]}, qty: {entry[1]}')

    for entry in ask:
        if entry[2] == INSERTED or entry[2] == CHANGED:
            print(f'ev: SELL_EVENT | DEPTH_EVENT, px: {entry[0]}, qty: {entry[1]}')

    for entry in bid_del:
        print(f'ev: BUY_EVENT | DEPTH_EVENT, px: {entry[0]}, qty: 0')

    for entry in ask_del:
        print(f'ev: SELL_EVENT | DEPTH_EVENT, px: {entry[0]}, qty: 0')

joytrust0609 commented 2 months ago

thanks a lot for ur snippet. It helps me a lot!! After read the snippet and the data.utils.binancefutures, i still have two questions.

Q1. what should be the data stream be? eg. pre_tick: ap1 100, av1 10; and a buy_trade happens( buy px 100, lot 5), and an add sell limit order is sent to the exchange( limit quote px 100, lot 5). finanally, the current tick turns to be ap1 100, av1 10. the lot_size is 1. is it: "ev: BUY_EVENT | TRADE_EVENT, px 100, qty 5" "ev: SELL_EVENT | DEPTH_EVENT, px 100, qty 5" or "ev: SELL_EVENT | DEPTH_EVENT, px 100, qty 10"

or just: "ev: SELL_EVENT | DEPTH_EVENT, px 100, qty 10"

or: "ev: SELL_EVENT | DEPTH_SNAPSHOT_EVENT, px 100, qty 10"

Q2: row 122 in data.utils.binancefutures: ` tmp[row_num] = ( DEPTH_EVENT | BUY_EVENT, exch_timestamp, local_timestamp, float(px), float(qty),

0 #,

                        #  0  #,
                        #  0  #
                    )

` what does the 0 stand for?

nkaz001 commented 2 months ago

Q1) You need to specify the final quantity, but the exchange provides the full tick-by-tick data, it should be ev: BUY_EVENT | TRADE_EVENT, exch_ts: T1, px 100, qty 5 ev: SELL_EVENT | DEPTH_EVENT, exch_ts: T1, px 100, qty 5 ev: SELL_EVENT | DEPTH_EVENT, exch_ts: T2, px 100, qty 10

Q2) The last three fields are unused in Level-2 data. Please see https://hftbacktest.readthedocs.io/en/latest/data.html#format

joytrust0609 commented 1 month ago

Q1) You need to specify the final quantity, but the exchange provides the full tick-by-tick data, it should be ev: BUY_EVENT | TRADE_EVENT, exch_ts: T1, px 100, qty 5 ev: SELL_EVENT | DEPTH_EVENT, exch_ts: T1, px 100, qty 5 ev: SELL_EVENT | DEPTH_EVENT, exch_ts: T2, px 100, qty 10

Q2) The last three fields are unused in Level-2 data. Please see https://hftbacktest.readthedocs.io/en/latest/data.html#format

i check the simulation result, and find the rebuild snapshot didnt match the correspond snapshot. i dont know if i need to add the trade stream to get the correct simulation. cuz if i ignore the trade stream, and only input the snapshot-diff stream ( result from the DiffOrderBookSnapshot function), the rebuild snapshot matches

nkaz001 commented 1 month ago

TRADE_EVENT isn't related to the market depth. There should be no difference in the snapshot with or without the TRADE_EVENT. TRADE_EVENT should not be input to the DiffOrderBookSnapshot function.

nkaz001 / hftbacktest

Need example using DiffOrderBookSnapshot #137

0 #,