Closed Alsac closed 1 week ago
Hello, thanks for your message.
The parameter count of SAMformer takes into consideration the linear weight matrices from the query, key and value in the attention as well as the last linear layer to forecast the horizon H. Hence, it depends on the seq_length, hid_dim, and horizon values. The parameter count established on the paper corresponds to the default value of such parameters in the original implementation. This demonstrates that SAMformer is particularly efficient compared to baselines and in particular compared to TSMixer is the most efficient competitor. However, you are right in saying that this can increase and potentially become very large for very high values of horizon for instance. In our experiments, until horizon=720, SAMformer remains more efficient.
I hope this answers your question. Don't hesitate to open an issue or send a mail if you have additional questions.
Ambroise
Hello Alzac,
Thank you very much for your very relevant remark. Indeed, there was a slight miscalculation in the parameters for SAMformer and TSMixer. The number of parameters for SAMformer was slightly underestimated. This number of parameters is equal to $L \times (4 \cdot d_m + H) = 512 \times (64 + H)$, whereas those for TSMixer were significantly underestimated.
I am sharing here the updated table to reflect these changes. This will also be updated in the paper.
In conclusion, SAMformer is not 3.73 times smaller than TSMixer in terms of the number of parameters but 10.67 times smaller than TSMixer on average.
Thank you again for pointing this out!
Romain
Dataset | H=96 (SAMformer) | H=96 (TSMixer) | H=192 (SAMformer) | H=192 (TSMixer) | H=336 (SAMformer) | H=336 (TSMixer) | H=720 (SAMformer) | H=720 (TSMixer) |
---|---|---|---|---|---|---|---|---|
ETT | 81,920 | 576,604 | 131,072 | 625,948 | 204,800 | 699,628 | 401,408 | 896,620 |
Exchange | 81,920 | 1,219,084 | 131,072 | 1,398,344 | 204,800 | 1,732,396 | 401,408 | 3,696,904 |
Weather | 81,920 | 1,105,598 | 131,072 | 1,154,942 | 204,800 | 1,228,622 | 401,408 | 1,425,614 |
Electricity | 81,920 | 1,266,502 | 131,072 | 1,315,846 | 204,800 | 1,389,526 | 401,408 | 1,586,518 |
Traffic | 81,920 | 3,042,412 | 131,072 | 3,091,756 | 204,800 | 3,165,436 | 401,408 | 3,362,428 |
Horizon (H) | SAMformer (avg params) | TSMixer (avg params) | Avg Ratio (TSMixer/SAMformer) |
---|---|---|---|
H=96 | 81,920 | 1,442,040 | 17.60 |
H=192 | 131,072 | 1,517,367 | 11.58 |
H=336 | 204,800 | 1,643,121 | 8.02 |
H=720 | 401,408 | 2,193,616 | 5.46 |
AVG | 10.67 |
Hello! Your work has given me a lot of inspiration, thank you.
Currently, I'm facing a point of confusion regarding the parameter count of SAMformer mentioned in the paper. I found that in the PyTorch version of the code, there are linear networks for Q, K, and V, as well as a linear_forecaster. If seq_len=512, hid_dim=16, and pred_horizon=96 (based on my understanding), then the model's parameters could become very large. Could you help resolve my confusion? Once again, thank you for your great work!