michaelsdr / momentumnet

Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities
https://michaelsdr.github.io/momentumnet/
MIT License
207 stars 19 forks source link

Missing baseline? #28

Open alexm-gc opened 2 years ago

alexm-gc commented 2 years ago

Thanks for your interesting work!

The Reformer uses RevNet in a clever way. They double the dimension of x such that for x1,x2=split(x) both x1 and x2 have the same dimension as the original x. This gives their invertible architecture the "same parameters" as the initial architecture. Let's call this ReformerRevNet.

Question 0. In Table 2, RevNet differs to MomentumNet only in the row "same parameters". I don't see why ReformerRevNet and MomNet would be different in Table 2?

Question 1. Is there any reason this ReformerRevNet baseline was not included?

Apologies for any misunderstanding.

michaelsdr commented 2 years ago

Hi @alexm-gc

Thank you for your questions.

For Question 0. As opposed to the architectures presented in table 2, the Reformer is dedicated to Transformers. In addition, note that the Reformer does Y_1 = X_1 + Attention(X_2) and then Y_2 = Y_1 + FF(Y_1) Thus, it uses two successive layers (Attention and FF) in its forward rule, which is not the case for the architectures considered in Table 2.

For Question 1. Actually in our paper we do not conduct experiments on Transformers though one can define the momentum counterpart of any Transformer.