nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester
https://nautilustrader.io
GNU Lesser General Public License v3.0
1.71k stars 402 forks source link

Bar based execution dynamics #1537

Closed dkharrat closed 3 months ago

dkharrat commented 3 months ago

I have been developing a strategy and backtesting it extensively. Results looked very promising. But after deploying it live, I was dismayed to see a significant discrepancy in the results. After investigation, it turns out the backtest results were inaccurate and misleading due to orders getting filled at an unrealistic time.

I have a strategy that uses 5-min bars. However, when submitting a limit order, the order is filled at some price within the same bar, which is unrealistic. To reflect real-world results, it should fill in the next 5-min bar (if the limit price happens to be within the bar price range).

More specifically: If I have a 5-min bar that opened at time t, the strategy would have received the bar when it's complete (i.e. at time t+5), so the order should execute from t+6 the earliest. Currently, the order is executed at time t.

I tried to workaround this using the LatencyModel, but that introduced a lot of indeterminism in the backtesting results, so it doesn't work.

Expected Behavior

Market orders are executed at the open of the next bar. Limit orders are executed within the next bar's price range.

Actual Behavior

Market orders are executed at the close of the same bar. Limit orders are executed within the same bar price range.

Specifications

cjdsellers commented 3 months ago

Hi @dkharrat

Which bar timestamps are you referring to, ts_event or ts_init, and is your data timestamped at the open or close of the bar?

Nautilus is built firstly to run on order book and tick level data, when you use bars to simulate and process execution there is some information loss, such as we don't know whether the high traded before the low or vis vera, and also when during the bars time interval the aggregated OHLC was hit.

Even with bar data, Nautilus maintains an orderbook per instrument per venue, and a bars OHLC prices are converted into ticks and iterated through the matching engine in an orderly way, there is also a synchronisation of time so that no events can occur out of order in terms of timestamps.

More specifically: If I have a 5-min bar that opened at time t, the strategy would have received the bar when it's complete (i.e. at time t+5), so the order should execute from t+6 the earliest. Currently, the order is executed at time t.

If your limit order is working in the market already, then if any of the bars prices trade through the order then it is matched and it will be executed - somewhere in that 5 minute interval (at either the open, high, low, or close - and in that order since this information has been lost already [edit: you can actually randomize this but thats another topic]). It would be rather unrealistic if we treated this bar as a discrete unit of time during which no matching occurs because we're stepping through a backtest bar-by-bar with an expectation that an order won't execute until the next bar, regardless of where the market price is?

If you meant, on submit you wouldn't expect the order to be executed until at least 5 minutes have passed, that doesn't seem realistic if your latency is milliseconds or better?

Does your bar data have opens which exactly match the last close, or are there gaps?

So I think I need more detail here to explain what might be going on, was your order working already, or are you submitting your order expecting it won't be executing until the next bar at t + 5 mins?

It may help to review the code around how:

Look forward to your response and we can continue to discuss bar execution dynamics. This probably deserves some more docs as well, so the specifications and expectations are clearly laid out.

cjdsellers commented 3 months ago

I should have re-read your initial post, and noticed this:

Expected Behavior Market orders are executed at the open of the next bar. Limit orders are executed within the next bar's price range.

Scenario and question If the last 5 minute bar closed at 00:05:00 (and this is now the current time), and the price is currently Ask 101, Bid 100, then you submit a market buy order - would the order not execute at or close to 101? (without additional granularity of orderbook data, and simulating fine latencies with LatencyModel etc, none of which really makes sense with bars unless your latency is greater than the bar interval?)

Stated another way, (and in the same scenario above) if your bar data gaps and the next open steps up to 102 - would you expect the market order to be executed at 102? If this is the case, then I see this as a data issue, because the information loss of aggregating bar data has caused gaps, and in reality you're not going to achieve the price of 102 as clearly some time has passed for the market to move to this new price, and notably with bars we don't even know when the market was exactly quoting 102.

I think this is the key difference in expectations here: the platform will simulate the market at the current bid/ask prices, and for an aggressive order will process this immediately + any defined latency. Your expectation is that the order won't be executed until the next bar comes in, which should have an open at the close of the last bar anyway?

Limit orders are executed within the next bar's price range.

For this one, if your limit order is marketable then it should execute at current bid/ask price? if its not marketable, then I agree and this is the current behaviour?

Hopefully this helps to clarify the discussion a little more.

dkharrat commented 3 months ago

Thanks for the details @cjdsellers!

For t in my example, I was referring to ts_event for the Bar, which corresponds to the open time of the bar. I wasn't aware of ts_init, so I checked its value and noticed it's the same value as ts_event, which is incorrect. It should be equal to ts_event + bar_timeframe. So, it looks like I was loading the bar data incorrectly. This is what caused my orders to be filled at unrealistic times.

I'm using BarDataWrangler and just noticed it accepts a ts_init_delta parameter. I've now set it to equal the bar's step size in nanoseconds, and that actually helped a lot. Now, orders are executed at the bar's ts_init the earliest (i.e. at the time the bar became available). It would be a good idea to have the ts_init_delta param in BarDataWrangler default to the bar's step size, instead of 0 as that makes more sense (I don't see a valid reason for why ts_init_delta should be anything less than the bar's step size, or is there?).

By the way, I provide both 5-min and 1-min bar data, and I noticed Nautilus uses the lowest granularity available (i.e.1-min bar data) to determine the order fill time, which is pretty nice, as that makes the fills more accurate.

Regarding your question:

Stated another way, (and in the same scenario above) if your bar data gaps and the next open steps up to 102 - would you expect the market order to be executed at 102?

Yes, exactly, that's my expectation. Isn't a gap-up realistically possible even in tick-level data? By the time the bar completed, the ask might not be at the bar's close anymore. I think Nautilus should use the next tick's ask price to be as realistic as possible (in my case, it would the next 1-min bar's open).

cjdsellers commented 3 months ago

Hi @dkharrat

There is an expectation that ts_event is the bar close time, although this doesn't matter so much, because data sorting and iteration is based on ts_init. With that said it makes the implementation and use of ts_init_delta clearer.

Yes, exactly, that's my expectation. Isn't a gap-up realistically possible even in tick-level data? By the time the bar completed, the ask might not be at the bar's close anymore. I think Nautilus should use the next tick's ask price to be as realistic as possible (in my case, it would the next 1-min bar's open).

I think we'll have to disagree on this point, to be as realistic as possible the fill will occur wherever the market is based on the timestamps in the data, and any latency modelling - i.e. how fills will actually be determined in an order book or matching engine. So the issue for you here is twofold:

I think what you're attempting to do here is simulate latency by using the inherent gaps in your bar data?

To help illustrate this, consider the original scenario and this time with tick level data and latency modeled. Lets say the next quote tick has a best ask of 102 and a timestamp of 00:05:01.000, and you're simulating your one-way latency at 00:00:00.100 (100ms). You send your market order which is going to hit the matching engine at 00:05:00.100 where the best ask is currently still at 101 per the data - and this is what Nautilus will simulate.

To achieve the functionality you're after would compromise the realism of backtesting on order book and tick level data, and so can't be changed. We're also wanting to improve the fill dynamics and modeling options here, which would be done prior to considering any sort of configuration for bar execution.

If you were to step through the code in the links I provided, I think you would find we're providing maximum realism - and I'm happy to discuss any individual points along that code path.

dkharrat commented 3 months ago

Hi @cjdsellers, thanks for the details. It helps understand where you're coming from.

There is an expectation that ts_event is the bar close time

That's good to know. This would be good to document, as it's not clear. However, as far as I know, most data sources consider the timestamp to be opening bar time. Wouldn't make sense to be consistent with the conventions used by most data sources?

To help illustrate this, consider the original scenario and this time with tick level data and latency modeled.

You're reasoning make sense to me and I agree with the behavior you described in your example scenario. I think the difference in how I'm thinking about it is I'm thinking about the bar data as "trade ticks" whereas you're thinking about it as "quote ticks" (correct me if I'm misunderstanding here).

If we're considering the bar data as quote ticks, then I would agree with your reasoning. However, if we treat bar data as trade ticks, then it would make sense to consider the next "trade tick" for the matching engine, not the last one, since the price at the last trade tick will probably not match the quote tick. Do you agree with my thinking here?

I did notice that there are data wranglers for trade ticks. How does Nautilus process trade ticks? Are they converted to quote ticks for the matching engine?

rsmb7z commented 3 months ago

@dkharrat If you load 1-minute bar data into Nautilus, you can then use internally aggregated 5-minute bars for your strategy. This approach guarantees that your strategy operates with 5-minute bars as intended, while the Matching Engine improves the realism by making use of higher-resolution data.

cjdsellers commented 3 months ago

Hi @dkharrat

I agree that its not so clear the expectation for bars is a timestamp on close. I wouldn't say most data sources timestamp on open in my experience, in my opinion it seems more logical to timestamp on the close - as that's when the bar is fully formed and represents a data event where an aggregation is completed - both are valid approaches anyway.

For the discussion on bar execution dynamics, I don't think it makes a difference whether we're considering quotes or trades. You could rewrite my scenario above as a trade tick, with the information we have available from bars only - until that trade hits the ask the market has not moved. Trade messages also occur in order book data, where there is a precise timestamp when the trade occurred - and we can also see where the market was quoting around these point too.

So I think you're wanting to infer this trade to mean that the ask had to move up to that price at some point not visible in your data?

However, if we treat bar data as trade ticks, then it would make sense to consider the next "trade tick" for the matching engine, not the last one, since the price at the last trade tick will probably not match the quote tick. Do you agree with my thinking here?

I don't think this makes sense, you said it yourself with "will probably not match the quote tick", its the lack of granularity available with the bar data which forces us make this assumption either one way or the other.

Nautilus will process quote ticks and trade ticks in the same way through the matching engine - only that quote ticks have additional information on both the bid and ask side, where as trade ticks have the single trade price and so that is where the market must be at that exact point in time.

Bar data is also only single prices per OHLC point, either bid, ask for quotes, or last for trades, in any of those cases I think we can both agree that processing bar data for execution should behave like trades?

It's valid to run backtests on all of the following types of data in combination or isolation, and these are listed in descending order of granularity:

Where I think we could converge on agreement is for the behavior of bar data treated as trade ticks, to simulate the market lacking the granularity of the other data types, you could take the next trade tick to mean either:

I can see how you might want to configure bar execution for a) which I think is your preference, and it might seem like the most realistic choice to "fill in the gaps" - but if this became the default it would degrade the realism of backtests running on order book + quote + trade data or any combination of these, which Nautilus is build for firstly.

I also think my point still stands that your bar close should match the next open? otherwise where was the market trading in between? it could have been far outside the next bars OHLC for all we know, or we could make an assumption that it was somewhere between the close and open - but again, we're not sure with bars.

If the close did match the next open, then the current implementation would also meet your expectations. I'm aware and understand that bar data often has a difference between these prices though.

So having said all of the above, I still think its not a case of "the Nautilus implementation is wrong and introduces look-ahead bias", its more how one prefers to represent the lack of information with bar data to simulate execution. We could probably introduce additional configuration options for processing bar data to cater for these differences in assumption, but as per one of my other messages:

We're also wanting to improve the fill dynamics and modeling options here, which would be done prior to considering any sort of configuration for bar execution.

Does this make sense so far?

dkharrat commented 3 months ago

Thanks for the detailed thoughts @cjdsellers! It's helpful and I think we're converging on the expected behavior.

I wouldn't say most data sources timestamp on open in my experience, in my opinion it seems more logical to timestamp on the close - as that's when the bar is fully formed and represents a data event where an aggregation is completed

You're actually right. After looking into it more, it does look like different data sources have different conventions, and the recommendation is to use the close time, as you suggested, as it reflects when the bar has been completed and would have prevented bugs that introduce look-ahead bias, like the one I ran into.

So I think you're wanting to infer this trade to mean that the ask had to move up to that price at some point not visible in your data?

Yes, that's right. I'm basically just trying to make the most conservative assumption on where the ask/bid would be at a specific instant of time, absent the quote tick information. My assumption is that a trade tick represents a completed execution at a specific time and it's possible the bid/ask has moved after that trade tick. Whereas with quote ticks for the same instant of time, you have the exact bid/ask to match with.

I can see how you might want to configure bar execution for a) which I think is your preference, and it might seem like the most realistic choice to "fill in the gaps" - but if this became the default it would degrade the realism of backtests running on order book + quote + trade data or any combination of these, which Nautilus is build for firstly.

Making it configurable makes sense. Perhaps, you could provide two configurations:

  1. the tick offset to use for the matching engine, where offset=0 means to take the current tick (which corresponds the existing behavior today) and offset=1 means to take the next tick, etc.
  2. which price to use when converting the bar data to ticks for the matching engine (e.g. either open or close).

I do agree that you want to default to whatever configuration provides the most realism. When you have quote ticks or orderbook deltas, I agree that offset=0; price_type=close (i.e. the existing behavior) makes sense and is the most realistic.

I also think my point still stands that your bar close should match the next open? otherwise where was the market trading in between?

Why is it not possible for there to be gaps? I can see that happening even at the highest resolution data. This could happen for very illiquid stocks or options. For an extreme example, check BRK.A as one case where you can see many bars with gaps between close and the next open. Options probably exhibit a similar behavior. For highly liquid stocks, I agree it's unlikely there will be gaps.

So having said all of the above, I still think its not a case of "the Nautilus implementation is wrong and introduces look-ahead bias", its more how one prefers to represent the lack of information with bar data to simulate execution.

I agree. After I've set ts_init to be the close time, that did improve my backtest results significantly and it's now much closer to the live results. So now it's just a matter of what assumption one is making for the matching engine. Making it configurable will help cater to differences in those assumptions.

cjdsellers commented 3 months ago

Hi @dkharrat

I'm glad we were able to get to the bottom of it, that it's simply a matter of what assumptions are being made about the bar based execution / trade tick only data - I agree that configurations would make sense to cater for different preferences and data type combinations for backtesting.

Why is it not possible for there to be gaps?

This is probably another case of assumptions made in bar aggregation. If the close timestamp is bar open + bar interval, then wouldn't this be exactly at, or very close to the open of the next bar? (some bars will close just before the next open, in which case - sure - there may be a movement up or down on the next trade tick).

This example you're giving for BRK.A, is this for daily bars by any chance? in which case I wouldn't treat that as a continuous market, as there are several distinct trading sessions. I was assuming high granularity intraday bars if Nautilus style execution is being simulated, and if backtesting just on daily bars - there are admittedly more suitable platforms out there right now which cater specifically for that time frame, and don't need the raw performance of Nautilus.

I'm glad you're getting better results now, closer to what you expect. If you get tired of waiting and wanted to attempt some bar execution configuration then I can help guide any PR you put up.

I'll also move this to discussions now - as its rather lengthy, and we can open a more specific issue for bar execution configuration with better specifications.