zihangJiang / TokenLabeling

Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Apache License 2.0
426 stars 36 forks source link

mix_token implementation #4

Closed AmberzzZZ closed 3 years ago

AmberzzZZ commented 3 years ago

in ur released lvvit.py code, mixtoken is implemented by cut & mix the origin gridmap and the flipped one, with labels no need to change, which is not as described in the paper. is this what you actually did during the training process?

zihangJiang commented 3 years ago

Yes. In our implementation (which we use in our training process), we cut & mix the tokens with the flipped ones. https://github.com/zihangJiang/TokenLabeling/blob/2e221d24fef15e14f467ba02fd800f81ed9ef5df/models/lvvit.py#L192-L199 And we then paste back the tokens after going through the transformer layers, which is equivalent to cut & mix the corresponding dense label maps as described in our paper. https://github.com/zihangJiang/TokenLabeling/blob/2e221d24fef15e14f467ba02fd800f81ed9ef5df/models/lvvit.py#L213-L219

Hope this answers your question.