liaopeiyuan / large-scale-hts-reconciliation

Large-Scale Hierarchical Time-Series Reconciliation
Creative Commons Attribution 4.0 International
3 stars 2 forks source link

[Question] Benchmarks and comparison #6

Open AzulGarza opened 1 year ago

AzulGarza commented 1 year ago

Hi! thank you very much for using hierarchicalforecast in your benchmarks and for making your work open source. :)

While developing the library, we realized that SGy multiplication could be a bottleneck in many applications. To optimize the code, we are taking a sparse matrix approach as you propose in your study.

However, we have some questions regarding the public benchmarks. In particular, we note that the execution time of hierarchicalforecast (measured on this notebook) considers:

While the evaluation of lhts seems only to consider the multiplication of SGy (as seen here and here), that is, the evaluation of lhts does not consider the preprocessing time considered in the evaluation of hierarchicalforecast.

For example, the function distrib.reconcile_dp_matrix receives S and P in vector form, so they must have been calculated and converted to that format before (and that conversion is not taken into account in the time performance).

https://github.com/liaopeiyuan/large-scale-hts-reconciliation/blob/536bcca52cca33d68fc8be6be6c61daecb653df7/demo/reconcile_mpi.py#L63-L67

Is this correct, or are we missing something?

If so, we believe the comparison is not with hierarchicalforecast but with numpy SGy multiplication, and its optimization relies on clever heuristics and engineering that come from the hierarchical reconciliation problem. Would you agree? If that's the case, maybe we can collaborate to include your code/library in hierarchicalforecast to optimize SGy. 🙌

Please let us know what you think.

Congrats on your work!

liaopeiyuan commented 1 year ago

Yes, I definitely concur that the comparison is with numpy SGy multiplication. hierarchicalforecast is one of the easiest ways to properly set up a reconciliation problem in Python, so we went with that, and the report we wrote was focusing on the clever heuristics that exist in a multi-process setting. Sorry for any miscommunication potentially introduced in the process, as the nature of the comparison was to illustrate the speedup with a naive set-up (which includes all overheads), not necessarily representative of hierarchicalforecast's performance. Would love to collaborate on further optimization efforts!