nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester
https://nautilustrader.io
GNU Lesser General Public License v3.0
1.7k stars 398 forks source link

How to handle orderbook snapshots #1701

Open VeraLyu opened 3 weeks ago

VeraLyu commented 3 weeks ago

Bug Report

Orderbooks depth snapshots (per 100ms) feeded in backtest, but orderbook output in strategy does not reflect orderbook status in trace.

Expected Behavior

I used OrderBook L2_MBP depth snapshots (triggered every 100ms) In my trace: ADAUSDT,1717600727800,1,1,b,SNAP,0.46170,208927,1 ADAUSDT,1717600727800,1,1,b,SNAP,0.46180,287249,1 ADAUSDT,1717600727800,1,1,b,SNAP,0.46190,217676,1 ADAUSDT,1717600727800,1,1,b,SNAP,0.46200,190538,1 ADAUSDT,1717600727800,1,1,b,SNAP,0.46210,69403,1 ADAUSDT,1717600727800,1,1,a,SNAP,0.46220,40649,1 ADAUSDT,1717600727800,1,1,a,SNAP,0.46230,98235,1 ADAUSDT,1717600727800,1,1,a,SNAP,0.46240,215218,1 ADAUSDT,1717600727800,1,1,a,SNAP,0.46250,229831,1 ADAUSDT,1717600727800,1,1,a,SNAP,0.46260,224554,1 ADAUSDT,1717600727915,1,1,b,SNAP,0.46180,308664,1 ADAUSDT,1717600727915,1,1,b,SNAP,0.46190,231775,1 ADAUSDT,1717600727915,1,1,b,SNAP,0.46200,189917,1 ADAUSDT,1717600727915,1,1,b,SNAP,0.46210,85427,1 ADAUSDT,1717600727915,1,1,b,SNAP,0.46220,110877,1 ADAUSDT,1717600727915,1,1,a,SNAP,0.46230,73556,1 ADAUSDT,1717600727915,1,1,a,SNAP,0.46240,160102,1 ADAUSDT,1717600727915,1,1,a,SNAP,0.46250,220399,1 ADAUSDT,1717600727915,1,1,a,SNAP,0.46260,208158,1 ADAUSDT,1717600727915,1,1,a,SNAP,0.46270,145751,1

I feed these snapshots to my strategy, just log data like in on_order_book function, I expect the orderbook to reflect my data

Actual Behavior

in the output log:

instrument: ADAUSDT-PERP.BINANCE sequence: 1 ts_last: 1717600727915000000 count: 110 ╭──────────┬────────┬──────────╮ │ bids      │ price    │ asks │ ├──────────┼────────┼──────────┤ │       │ 0.4624    │ [160102] │ │          │ 0.4623   │ [73556] │ │ [40649]    │ 0.4622   │ [40649] │ │ [110877]   │ 0.4622   │ [110877] │ │ [85427]    │ 0.4621  │   │ │ [189917]   │ 0.4620  │   │ ╰──────────┴────────┴──────────╯^[[0m

Steps to Reproduce the Problem

  1. use the example backtest
  2. use the trace provided

Specifications

VeraLyu commented 3 weeks ago

Not sure if we are allowed to use periodically depth snapshots only, or we need to use snapshots with deltas?

VeraLyu commented 3 weeks ago

And in log it shows I have 6 levels both sides, but I actually only have 5 levels of depth each side:

^[[1m2024-06-05T15:18:48.022000000Z^[[0m [INFO] BACKTESTER-001.CrossAssetsInstrumentStrategy:
     [Level(price=0.4622, orders=[BookOrder(side=SELL, price=0.4622, size=40649, order_id=462200000)])
     Level(price=0.4623, orders=[BookOrder(side=SELL, price=0.4623, size=73556, order_id=462300000)]),
    Level(price=0.4624, orders=[BookOrder(side=SELL, price=0.4624, size=160102, order_id=462400000)]),
    Level(price=0.4625, orders=[BookOrder(side=SELL, price=0.4625, size=220399, order_id=462500000)]),
    Level(price=0.4626, orders=[BookOrder(side=SELL, price=0.4626, size=208158, order_id=462600000)]),
    Level(price=0.4627, orders=[BookOrder(side=SELL, price=0.4627, size=145751, order_id=462700000)])]^[[0m
^[[1m2024-06-05T15:18:48.022000000Z^[[0m [INFO] BACKTESTER-001.CrossAssetsInstrumentStrategy: [
    Level(price=0.4622, orders=[BookOrder(side=BUY, price=0.4622, size=110877, order_id=462200000)]),
    Level(price=0.4621, orders=[BookOrder(side=BUY, price=0.4621, size=85427, order_id=462100000)]),
       Level(price=0.4620, orders=[BookOrder(side=BUY, price=0.4620, size=189917, order_id=462000000)]), 
      Level(price=0.4619, orders=[BookOrder(side=BUY, price=0.4619, size=231775, order_id=461900000)]), 
       Level(price=0.4618, orders=[BookOrder(side=BUY, price=0.4618, size=308664, order_id=461800000)]), 
      Level(price=0.4617, orders=[BookOrder(side=BUY, price=0.4617, size=208927, order_id=461700000)])]^[[0m
cjdsellers commented 3 weeks ago

Hey @VeraLyu

I just need some more info to figure out if there is any bug here.

I feed these snapshots to my strategy

Two questions here:

Normally you would just subscribe to an order book with subscribe_order_book_deltas or subscribe_order_book_snapshots and the DataEngine will manage the book for you. Of course for backtesting this still relies on you adding the necessary data.

Not sure if we are allowed to use periodically depth snapshots only, or we need to use snapshots with deltas?

You need to decide if you're using an initial snapshot and the deltas, or just snapshots to update. Also the book action on the deltas need to be set correctly per the adapter impl otherwise there could be strange behavior in the book state.

I have a fairly high confidence in the correctness of the order book at this point, typically recent issues stem from bad data input.

VeraLyu commented 3 weeks ago

Hey @VeraLyu

I just need some more info to figure out if there is any bug here.

I feed these snapshots to my strategy

Two questions here:

  • Are you parsing the Binance orderbook data yourself into Nautilus OrderBookDeltas?
  • Are you manually managing the order book in the strategy, outside the normal flow of the DataEngine?

Here is how I pass the data, I didn't manage orderbooks outside and didn't use deltas, the data I collected is from partial book depth 5 sym@depth5@100ms


config = BacktestEngineConfig(
trader_id=TraderId("BACKTESTER-001"),
logging=LoggingConfig(log_level="INFO"),
)
BINANCE = Venue("BINANCE")
# Build the backtest engine
engine = BacktestEngine(config=config)

# Use actual Binance instrument for backtesting
provider: BinanceFuturesInstrumentProvider = asyncio.run(create_provider())

instrument_id = InstrumentId(symbol=Symbol("ADAUSDT-PERP"), venue=BINANCE)
instrument = provider.find(instrument_id)
if instrument is None:
    raise RuntimeError(f"Unable to find instrument {instrument_id}")

engine.add_venue(
    venue=BINANCE,
    oms_type=OmsType.NETTING,
    account_type=AccountType.MARGIN,
    base_currency=None,
    starting_balances=[Money(1_000_000, USDT)],
)

engine.add_instrument(instrument)
data_dir = Path("~/my_nautilus").expanduser() / "tests" / "test_data"/"binance"
path_snap = data_dir / "ADAUSDT-orderbook.csv"
print(f"Loading {path_snap} ...")
df_snap = BinanceOrderBookDeltaDataLoader.load(path_snap)
print(str(df_snap))

print("Wrangling OrderBookDelta objects ...")
wrangler = OrderBookDeltaDataWrangler(instrument=instrument)
deltas = wrangler.process(df_snap)

engine.add_data(deltas)

> Normally you would just subscribe to an order book with `subscribe_order_book_deltas` or `subscribe_order_book_snapshots` and the `DataEngine` will manage the book for you. Of course for backtesting this still relies on you adding the necessary data.
> 
> > Not sure if we are allowed to use periodically depth snapshots only, or we need to use snapshots with deltas?
> 
> You need to decide if you're using an initial snapshot and the deltas, or just snapshots to update. Also the book `action` on the deltas need to be set correctly per the adapter impl otherwise there could be strange behavior in the book state.
> 
> I have a fairly high confidence in the correctness of the order book at this point, typically recent issues stem from bad data input.
VeraLyu commented 3 weeks ago

And in the strategy I just print out the orderbook when calling on_order_book() function, no other manipulation.

class CrossAssetsInstrumentStrategy(Strategy):
    def __init__(self, config: CrossAssetsInstrumentConfig):
        super().__init__(config)
        self.instrument_id_list = config.usdc_instruments
        self.instruments_str_list = [instrument.value for instrument in self.instrument_id_list]
        self.book_type: nautilus_pyo3.BookType = nautilus_pyo3.BookType("L2_MBP")
        self.books_dict_remaining = {instrument_id.value: nautilus_pyo3.OrderBook(
            self.book_type, nautilus_pyo3.InstrumentId.from_str(instrument_id.value)
            ) for instrument_id in self.instrument_id_list}

    def on_start(self):
        super().on_start()

        for instrument_id in self.instrument_id_list:
            self.instruments[instrument_id.value] = self.cache.instrument(instrument_id)
            self.subscribe_order_book_snapshots(instrument_id=instrument_id, depth=5, interval_ms=100)

    def on_order_book(self, order_book: OrderBook):
        self.log.info(f"{order_book.asks()}")
        self.log.info(f"{order_book.bids()}")
cjdsellers commented 3 weeks ago

Thanks for the code snippet @VeraLyu

So I suspect what's happening here is we're not clearing the book state before applying a snapshot. I fixed the OrderBookDeltaDataWrangler to now prepend a CLEAR action when the first delta (at least) is a snapshot d7061835942026688e18391711a5a17ffa0e54d5.

Let me know if this fixes things for you.

cjdsellers commented 3 weeks ago

Something else I've just realized is that you're going to have to iterate over each snapshot and parse separately for this to work.

From memory Binance provide a snapshot file and then depth per day which will work.

If you're just capturing snapshots yourself into one big file then you'll need to somehow iterate over the discrete bulks of snapshot messages Binance are sending over ws.

VeraLyu commented 3 weeks ago

Something else I've just realized is that you're going to have to iterate over each snapshot and parse separately for this to work.

Yes, I parse each 100ms snapshots and write every item of 'b' and 'a' item to the csv file. Is a more preferred way to produce a snapshot together with deltas? I mean both ways snapshots and delta csv files need be generated by ourselves (not nautilus), correct me if I'm wrong. From memory Binance provide a snapshot file and then depth per day which will work.

If you're just capturing snapshots yourself into one big file then you'll need to somehow iterate over the discrete bulks of snapshot messages Binance are sending over ws.

cjdsellers commented 3 weeks ago

This sounds fine, as long as you can identify each discrete web socket message snapshot. Then you can iterate over these and parse each into separate groups of snapshot messages which begin with a clear action.

VeraLyu commented 3 weeks ago

I tested with fix applied, but still got 6 levels overlapped orderbooks:

^[[1m2024-06-05T15:18:48.022000000Z^[[0m [INFO] BACKTESTER-001.CrossAssetsInstrumentStrategy: [Level(price=0.46220, orders=[BookOrder(side=SELL, price=0.46220, size=40649, order_id=462200000)]), Level(price=0.46230, orders=[BookOrder(side=SELL, price=0.46230, size=73556, order_id=462300000)]), Level(price=0.46240, orders=[BookOrder(side=SELL, price=0.46240, size=160102, order_id=462400000)]), Level(price=0.46250, orders=[BookOrder(side=SELL, price=0.46250, size=220399, order_id=462500000)]), Level(price=0.46260, orders=[BookOrder(side=SELL, price=0.46260, size=208158, order_id=462600000)]), Level(price=0.46270, orders=[BookOrder(side=SELL, price=0.46270, size=145751, order_id=462700000)])]^[[0m
^[[1m2024-06-05T15:18:48.022000000Z^[[0m [INFO] BACKTESTER-001.CrossAssetsInstrumentStrategy: [Level(price=0.46220, orders=[BookOrder(side=BUY, price=0.46220, size=110877, order_id=462200000)]), Level(price=0.46210, orders=[BookOrder(side=BUY, price=0.46210, size=85427, order_id=462100000)]), Level(price=0.46200, orders=[BookOrder(side=BUY, price=0.46200, size=189917, order_id=462000000)]), Level(price=0.46190, orders=[BookOrder(side=BUY, price=0.46190, size=231775, order_id=461900000)]), Level(price=0.46180, orders=[BookOrder(side=BUY, price=0.46180, size=308664, order_id=461800000)]), Level(price=0.46170, orders=[BookOrder(side=BUY, price=0.46170, size=208927, order_id=461700000)])]^[[0m

ob_scatch Here I attached csv I used: ADAUSDT-orderbook.csv for convenience to reproduce.

VeraLyu commented 2 weeks ago

@cjdsellers Hi Chris, I'm currently collecting the DELTA +SNAP as an alternative plan, because I'm worried about packet loss of deltas, I collect SNAPs every hour to make sure orderbooks can be recovered under condition of delta packet loss. But this seems also trigger this bug. Could you let me know what is our current backtest solution when testing with large orderbook data, is it snap per day + deltas?

cjdsellers commented 2 weeks ago

Hey @VeraLyu

This is a brief response because I haven't looked at this further yet. But you'll want to make sure the way you're parsing and providing the records to the engine meets these specs:

https://binance-docs.github.io/apidocs/spot/en/#diff-depth-stream https://github.com/nautechsystems/nautilus_trader/blob/develop/nautilus_trader/adapters/binance/common/data.py#L418 https://docs.nautilustrader.io/integrations/binance.html#order-books

This will be the next thing I look at for this, for now assume the order book logic is correct and its all about providing the correct data.

Hope that helps!

VeraLyu commented 2 weeks ago

Hey @VeraLyu

This is a brief response because I haven't looked at this further yet. But you'll want to make sure the way you're parsing and providing the records to the engine meets these specs:

https://binance-docs.github.io/apidocs/spot/en/#diff-depth-stream https://github.com/nautechsystems/nautilus_trader/blob/develop/nautilus_trader/adapters/binance/common/data.py#L418 https://docs.nautilustrader.io/integrations/binance.html#order-books

This will be the next thing I look at for this, for now assume the order book logic is correct and its all about providing the correct data.

I think correct data does not grantee no data packet loss when collecting orderbook delta updates. According to binance-doc about manage local order book, if we identify pu not equal to last u, it means there is a packet loss and snap will need to be used to reset orderbook before apply deltas. So our backtest logic needs to comply multiple snapshots scenario. Hope that helps!

cjdsellers commented 2 weeks ago

Yes, so the backtest data needs to be as per the Binance client. If correct snapshots are applied to the book then it will be in the correct state I think?