nautechsystems / nautilus_trader

A high-performance algorithmic trading platform and event-driven backtester
https://nautilustrader.io
GNU Lesser General Public License v3.0
1.95k stars 445 forks source link

Backtest data improvements #337

Closed limx0 closed 3 years ago

limx0 commented 3 years ago

We should take a look at improving the way data is loaded into the backtests. Theres' quite a few ways people could want to load data into nautilus, but I think we can make some improvements to the user experience by adding a module and a couple of helper classes for loading data.

I can think of the follow ways people might have gathered some data that they want to feed into nautilus (please add any I have missed!):

A couple of the things I think could be improved:

Proposal:

1) Improve discovery & access to backtest data.

I'd be keen to hear other thoughts on the above, as well as knowing how everyone else is storing and accessing their data for backtests?

cjdsellers commented 3 years ago

All great points.

Adding my own thoughts, we could separate the planning and implementation stages on the data pipeline side into: Data storage/warehousing (many sources and formats both local and remote) -> data streaming/downloading into local environment DataStreamer? DataLoader ? fsspec -> data transformation into Nautilus objects DataTransformer? -> BacktestDataContainer rises from the grave. -> BacktestEngine data ingest and running (example/user scripts) and/or: -> BacktestBatchRunner? utilizing the above machinery.

Separating things in this ways allows users the flexibility to conduct research/exploration or other testing from an intermediate format (parquet etc), pandas DataFrames as required to interface with other Python libraries.

Then it may make sense to revive something like a BacktestDataContainer to hold data which has been converted into Nautilus objects for any reason including backtest runs. Following this approach would standardize the API of BacktestEngine to deal exclusively with built Nautilus objects, which would otherwise be a space complexity issue without the existence of some kind of batch runner.

The batch runners main task would be to orchestrate data ingest -> transformation -> backtest run (without a full reset) jobs.

It would probably also make sense to provide some standard parser base classes, which should be general enough to be used for either live or backtest use cases. Or some kind of multi-adapter fsspec -> Nautilus objects.

limx0 commented 3 years ago

Just an update - the first pass of this is done in #343. Still on the todo list are:

I will likely address the above in the coming weeks.

limx0 commented 3 years ago

Closing this in favour of newer issues.