We should optimize method TSDataset.describe, because it can consume up to 30% of all computation time during backtest on NaiveModel with 10k segments.
Proposal
In current implementation the bottleneck is TSDataset._gather_segments_data and it should be optimized. The problem lies in per-segment iteration.
Possible solution:
Vectorization
Optimization of one iteration
Rewriting cycle using numba
As an alternative we could optimize the places where TSDataset.describe is used:
🚀 Feature Request
We should optimize method
TSDataset.describe
, because it can consume up to 30% of all computation time during backtest onNaiveModel
with 10k segments.Proposal
In current implementation the bottleneck is
TSDataset._gather_segments_data
and it should be optimized. The problem lies in per-segment iteration.Possible solution:
numba
As an alternative we could optimize the places where
TSDataset.describe
is used:BasePipeline._make_predict_timestamps
FoldMask.validate_on_dataset
Test cases
Make sure current tests pass.
Additional context
Connected issues: #1336.