Walk Forward Parallelism Optimization Idea

Hello,

I was recently doing some profiling on QC and noticed that the reading of data takes quite some time compared to the rest of the algorithm (HD are slow.)

Since the way walk forward works is to reuse the same data over and over, I was thinking if you have developed something which could optimize this.

Idea 1: For example, read all the data from disk first, then have each batcher use the in RAM data instead of rereading it from the disk from then on. Or, have each batched work in parallel as follows:

Idea 2: (Extreme optimization, probably not worth it) 5 instances: training on 12 months, testing on the following 1 month, with 1 month walk forward for each. 1) starts on 2016 Jan 2) starts on 2016 feb etc.

This would be rereading the data for May, 5 times from disk. What if Instance 2 is started only when instance 1 reaches February, then both are fed that data simultaneously. It's an extreme optimization.

jameschch / LeanOptimization

Walk Forward Parallelism Optimization Idea #16