Context for Performance Claims & Benchmarking Script

rhotchkiss commented 1 year ago

There are some claims about the performance of the Nautilus backtesting engine (i.e. "500,000+ events per second", "Backtest engine fast enough to be used to train AI trading agents (RL/ES)") I have been unable to find any useful context to provide meaning to these claims.

I am proposing adding a simple backtest script to the examples which achieves these numbers along with docs with the specs of the hardware which run it. This can further act as an end to end integration test to check that overall performance is not being lost.

For some colour, below are the the events per second (calculated as "Total events"/Elapsed time" in the BACKTEST POST-RUN statistics) of some some of the example backtests run (casually) on my desktop (Ryzen 9 5900X, 64GB of DDR4 ram).

script	events per second
crypto_ema_cross_ethusdt_trade_ticks.py	113.72
fx_ema_cross_audusd_bars_from_ticks.py	123.57
fx_ema_cross_audusd_ticks.py	115.32

P.S. unable to add a docs label :(

cjdsellers commented 1 year ago

Hi @rhotchkiss

Thanks for your interest in the project!

There are currently no standardized performance benchmarks for backtesting, which is actually a great idea and something I would like to put together.

In the post backtest run performance statistics, the Total events are merely how many Event objects were generated by the system (such as order or position events), which is highly dependent on the data and strategies. Indeed there could be many iterations of the backtest engine with no events produced. Because of this, total data event iterations or Iterations would be a much better measure (and what the marketing copy on the landing page is loosely based on).

An iteration is considered a data event, where a Data object has traveled through the system including being consumed by the simulated exchange, clocks for all strategies and components advanced etc, and all order, position and strategy computations run. So there is some non-trivial compute here (not just simply a case of incrementing an integer counter and calling it an iteration).

If we calculate in this way then running the above examples (casually) on my MacBook M1 laptop.

script	events per second (data)
crypto_ema_cross_ethusdt_trade_ticks.py	69,806 iterations in 00:00:00.245595500 = 284,232
fx_ema_cross_audusd_bars_from_ticks.py	100,000 iterations in 00:00:00.624540791 = 160,177
fx_ema_cross_audusd_ticks.py	100,000 iterations in 00:00:00.627017542 = 159,485

So as you can see there is a fair bit of variance, and it seems there has been a slight degradation in performance after integrating some clock components written in Rust (its half way done and there are twice as many function calls to make it work at the moment), so basically the performance of the platform is still in a state of flux while heavy development work continues.

I've personally seen the crypto_ema_cross_ethusdt_trade_ticks.py run on this machine in around 0.1s which is partly what the hand wavy claim of 500,000+ is based on. Also, seeing the orderbook processing around 300,000 deltas / second - again no standardized benchmark at this stage.

Hope that helps to shed some light on the figures!

rhotchkiss commented 1 year ago

Thanks, that does indeed provide a lot of the missing context.

cjdsellers commented 1 year ago

When I do another sweep of the landing page and upgrade it, I'll be sure to add some links to some more concrete information to back up performance claims. Thanks for the reminder on this!

nautechsystems / nautilus_trader

Context for Performance Claims & Benchmarking Script #730