denormalize fails in iTransformerNormConditioned

lucidrains / iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

MIT License

449 stars 36 forks source link

denormalize fails in iTransformerNormConditioned #20

Closed gdevos010 closed 10 months ago

gdevos010 commented 11 months ago

I have been doing a bunch of experiments this past week. I believe the denormalize fails in iTransformerNormConditioned when num_tokens_per_variate != 1.

https://github.com/lucidrains/iTransformer/blob/af64123b085a3b3e047476cabb36d7053685c3de/iTransformer/iTransformerNormConditioned.py#L231

lucidrains commented 11 months ago

I see

can you report whether you see an improvement with more than one token per variate? could just remove it

lucidrains commented 11 months ago

regardless, let's save this for after the holidays

happy new years!

gdevos010 commented 11 months ago

Happy New Years! There is no rush, I was taking a break from all the gatherings.

Yes, I found increasing num_tokens_per_variate to improve results in the base iTransformer.

lucidrains commented 11 months ago

@gdevos010 ah nice! that's great to hear. please share your experiments publicly, in the spirit of open source (the number of tokens per variate was something i threw in on a hunch, but not explored in the paper)

i'll get it fixed late next week

gdevos010 commented 11 months ago

I will share when I can make a nice table out of them. Should be tomorrow or this week.

lucidrains commented 11 months ago

@gdevos010 nice! excited to see your results

issue should be addressed in the latest version (0.5.2)

gdevos010 commented 10 months ago

@lucidrains Ray tune was giving me some trouble but I have the results. Unfortunately, because of this, not all models were tuned the same amount and I know better performance could be achieved. All models had at least 10 trials on each dataset

MSE score:	Model	ETTh1	ETTh1	ExchangeRate*	Hydro Energy
TiDE**	0.00475	0.089	0.322	0.673	1.010
TCN	0.058	0.097	0.478	0.682	1.311
iTransformerModel	0.0476	0.095	0.378	0.683	1.080
iTransformerFFTModel	0.050	0.093	0.503	0.629	1.329
iTransformerNormCondModel	0.814	0.131	0.655	1.912	2.468
iTransformerFlowModel	0.0482	0.094	0.368	0.663	1.450

* modified ExchangeRate dataset ** TiDE was tuned more than any of the others

iTransformerFlowModel is iTransformer with FlowAttention

All iTransformer variates benefited from an increase in the num_tokens_per_variate to 2 or 3.

I did not train the 2d version because of how slow it is to train.

lucidrains commented 10 months ago

@gdevos010 this is great! thank you, and i'll look into flow attention, first time hearing about it

lucidrains commented 10 months ago

i'll remove the norm conditioned model at the next release

seems like it performs really badly

lucidrains commented 10 months ago

@gdevos010 do you have a table for ablation of the tokens per variate? just curious how big the improvement is

gdevos010 commented 10 months ago

@lucidrains Ill get you that as soon as I can

gdevos010 commented 10 months ago

@lucidrains It's a pretty meaningful improvement

num_tokens_per_variate	ETTh2	Exchange Rate
1	0.187	0.710
2	0.099	0.578
3	0.092	0.710
4	0.095	0.582

lucidrains commented 10 months ago

thank you!

lucidrains commented 10 months ago

@gdevos010 you should def try the 2d version, as number of tokens per variate is basically serving the same purpose

start with a low number of time tokens and titrate up