Closed gdevos010 closed 10 months ago
I see
can you report whether you see an improvement with more than one token per variate? could just remove it
regardless, let's save this for after the holidays
happy new years!
Happy New Years! There is no rush, I was taking a break from all the gatherings.
Yes, I found increasing num_tokens_per_variate to improve results in the base iTransformer.
@gdevos010 ah nice! that's great to hear. please share your experiments publicly, in the spirit of open source (the number of tokens per variate was something i threw in on a hunch, but not explored in the paper)
i'll get it fixed late next week
I will share when I can make a nice table out of them. Should be tomorrow or this week.
@gdevos010 nice! excited to see your results
issue should be addressed in the latest version (0.5.2)
@lucidrains Ray tune was giving me some trouble but I have the results. Unfortunately, because of this, not all models were tuned the same amount and I know better performance could be achieved. All models had at least 10 trials on each dataset
MSE score: | Model | ETTh1 | ETTh1 | ExchangeRate* | Hydro Energy | Sunspots |
---|---|---|---|---|---|---|
TiDE** | 0.00475 | 0.089 | 0.322 | 0.673 | 1.010 | |
TCN | 0.058 | 0.097 | 0.478 | 0.682 | 1.311 | |
iTransformerModel | 0.0476 | 0.095 | 0.378 | 0.683 | 1.080 | |
iTransformerFFTModel | 0.050 | 0.093 | 0.503 | 0.629 | 1.329 | |
iTransformerNormCondModel | 0.814 | 0.131 | 0.655 | 1.912 | 2.468 | |
iTransformerFlowModel | 0.0482 | 0.094 | 0.368 | 0.663 | 1.450 |
* modified ExchangeRate dataset ** TiDE was tuned more than any of the others
iTransformerFlowModel is iTransformer with FlowAttention
All iTransformer variates benefited from an increase in the num_tokens_per_variate to 2 or 3.
I did not train the 2d version because of how slow it is to train.
@gdevos010 this is great! thank you, and i'll look into flow attention, first time hearing about it
i'll remove the norm conditioned model at the next release
seems like it performs really badly
@gdevos010 do you have a table for ablation of the tokens per variate? just curious how big the improvement is
@lucidrains Ill get you that as soon as I can
@lucidrains It's a pretty meaningful improvement
num_tokens_per_variate | ETTh2 | Exchange Rate |
---|---|---|
1 | 0.187 | 0.710 |
2 | 0.099 | 0.578 |
3 | 0.092 | 0.710 |
4 | 0.095 | 0.582 |
thank you!
@gdevos010 you should def try the 2d version, as number of tokens per variate is basically serving the same purpose
start with a low number of time tokens and titrate up
I have been doing a bunch of experiments this past week. I believe the denormalize fails in iTransformerNormConditioned when
num_tokens_per_variate != 1
.https://github.com/lucidrains/iTransformer/blob/af64123b085a3b3e047476cabb36d7053685c3de/iTransformer/iTransformerNormConditioned.py#L231