raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Temperature in Gumbel Softmax #26

Closed kaikai23 closed 1 year ago

kaikai23 commented 1 year ago

Hi, thanks for your inspiring work!

I notice that you used the default temperature=1 in all your F.gumbel_softmax implementations, and it didn't anneal to 0. Do you have any suggestions on why should we fix this temperature? Because I thought shouldn't we decrease this temperature during training to make it closer and closer to the real categorical distribution, as indicated in the Gumbel Softmax paper?

liuzuyan commented 1 year ago

Hi, thanks for your interest in our work. In our implementation, we leave the temperature in F.gumbel_softmax as the default setting just for simplicity. Decreasing the temperature hyper-parameter may be a possibility to boost training performance and I guess it's worth trying.