vita-epfl / ttt-plus-plus

[NeurIPS21] TTT++: When Does Self-supervised Test-time Training Fail or Thrive?
MIT License
56 stars 6 forks source link

Multi-epoch training was performed on the test set. #3

Open Dyb3438 opened 2 years ago

Dyb3438 commented 2 years ago

There is a discrepancy about test-time adaptation in this code that has me wondering.

When adaptation operation runs on the test set, TTT and Tent perform only one epoch instead of hundreds of epochs. As I understand it, this code performs multiple epochs of adaptation to the network on the test set, which often does not make sense in practice in my opinion.

YuejiangLIU commented 2 years ago

Thanks for the question!

To my knowledge, both single-epoch and multi-epoch are commonly used in prior literature. The code of TTT and Tent use a single epoch, whereas SHOT, another baseline method we compared with, falls into the latter.

I personally lean towards the multi-epoch setting (with an oracle for model selection) for evaluation and comparison. The reason is that, in the single-epoch setting, the adaptation performance is often quite sensitive to the choice of the learning rate, which can lead to noisy comparisons. In contrast, in our multi-epoch evaluation, we chose relatively small learning rates and ran the adaptation for sufficiently long to thoroughly estimate the effectiveness of an algorithm.

Besides, even in practice, I believe that using the test examples at hand for multiple epochs is still a better choice, if computational time allows. This is probably a subjective opinion though.

p.s. Why do you think multiple-epoch does not make sense in practice?