tinkoff-ai / etna

ETNA – Time-Series Library
https://etna.tinkoff.ru
Apache License 2.0
862 stars 80 forks source link

Optimize `TSDataset.describe` #1341

Closed Mr-Geekman closed 1 year ago

Mr-Geekman commented 1 year ago

🚀 Feature Request

We should optimize method TSDataset.describe, because it can consume up to 30% of all computation time during backtest on NaiveModel with 10k segments.

Proposal

In current implementation the bottleneck is TSDataset._gather_segments_data and it should be optimized. The problem lies in per-segment iteration.

Possible solution:

As an alternative we could optimize the places where TSDataset.describe is used:

Test cases

Make sure current tests pass.

Additional context

Connected issues: #1336.