jameschch / LeanOptimization

Genetic optimization using LEAN
Apache License 2.0
50 stars 14 forks source link

Walk Forward Parallelism Optimization Idea #16

Closed worthy7 closed 4 years ago

worthy7 commented 7 years ago

Hello,

I was recently doing some profiling on QC and noticed that the reading of data takes quite some time compared to the rest of the algorithm (HD are slow.)

Since the way walk forward works is to reuse the same data over and over, I was thinking if you have developed something which could optimize this.

Idea 1: For example, read all the data from disk first, then have each batcher use the in RAM data instead of rereading it from the disk from then on. Or, have each batched work in parallel as follows:

Idea 2: (Extreme optimization, probably not worth it) 5 instances: training on 12 months, testing on the following 1 month, with 1 month walk forward for each. 1) starts on 2016 Jan 2) starts on 2016 feb etc.

This would be rereading the data for May, 5 times from disk. What if Instance 2 is started only when instance 1 reaches February, then both are fed that data simultaneously. It's an extreme optimization.

jameschch commented 7 years ago

This is a great idea. I do have an issue open to perform these kinds of tests. One idea was to store backtest data in a ram disk. I have a feeling lean anticipates the data and loads it ahead of time. Reading from disk multiple times is only a problem if you have to wait. Only way to be sure is test it.