Is there an issue with the runtime speed of version 2.0.0rc0? - Githubissues

nkaz001 / hftbacktest

A high-frequency trading and market-making backtesting tool in Python and Rust, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and order books, with real-world crypto market-making examples for Binance Futures

MIT License

1.78k stars 357 forks source link

Is there an issue with the runtime speed of version 2.0.0rc0? #117

Open kasperlmc opened 1 month ago

kasperlmc commented 1 month ago

Why do I feel that version 2.0.0rc0 runs slower than version 1.8.4, and even though the elapse parameter has been changed and the number of strategy loops reduced in version 2.0.0rc0, there’s no impact on the program's runtime? It's very strange.

nkaz001 commented 1 month ago

Version 2 can be slower than version 1.8.4, primarily due to the support for L3 feed, which introduces unnecessary data size overhead when backtesting L2. Additionally, Rust's strong safety restrictions may contribute to a minor performance impact. A slowdown of about 15-20% is expected; if you experience a greater decrease in performance, please let me know.

kasperlmc commented 1 month ago

Oh, I see. If using more data leads to longer runtime, that makes sense. Is it also because of this reason that manually adjusting the elapse parameter to control the backtest interval does not reduce the backtest time overhead?

kasperlmc commented 1 month ago

What confuses me about this backtest time overhead is that whether I use 100ms or 1000ms for the loop, the time consumption ends up being the same. In the previous version 1.8.4, the finer the loop interval, the slower the backtest was, because it resulted in more loops.

nkaz001 commented 1 month ago

It depends on the operations performed within the elapse loop. If it's very simple, changing the interval won't have much effect. I will also test this in the example strategy.

kasperlmc commented 1 month ago

the example strategy.

ok.thank you😂

nkaz001 commented 1 month ago

I wasn't able to test it in v1, but in v2, as I mentioned earlier, the computation in the loop of the elapse method has a significant impact. In the high-frequency grid trading example, where calculations are relatively simple, the interval doesn't have much effect. However, in the GLFT example, the interval has a greater impact, and in the Order Book Imbalance example, the effect is even more pronounced. I'm curious—what are your observations when comparing v1 to v2?

kasperlmc commented 1 month ago

My observation is as follows: When I used V1, I noticed that when I set the loop parameter to 100 milliseconds, the time required for the program to run was much longer than when the loop parameter was set to 2000 milliseconds. With the 2000-millisecond parameter, it took about 80 seconds to run one day's worth of data. Then, I tested a simple market-making strategy that I wrote myself, which places orders up and down. However, when I used the same logic with V2, I found that regardless of whether I used a 100-millisecond loop parameter or a 2000-millisecond loop parameter, it took about 40 seconds to run one day's worth of data

nkaz001 commented 1 month ago

Did you have a chance to reduce the elapsed interval below 100ms? I notice a significant increase in backtesting time when I decrease the interval to 10ms or less. It seems that above 100ms, the computation in the elapsed loop is not as intensive in the v2 implementation. Essentially, v1 and v2 share the same logic, but the implementation details differ. v2 is more optimized in some areas.

kasperlmc commented 1 month ago

I haven't tried loop parameters below 100 milliseconds because my data is recorded every 100 milliseconds. Are you saying that in V2, if the loop parameter is set to 100 milliseconds or more, changing the loop parameter doesn't significantly affect the run time?

nkaz001 commented 1 month ago

It depends on the computation within the elapse loop. By the way, what I mentioned is based on the incremental depth feed and full tick-by-tick trades feed. Generally, using sampled data is not recommended.