keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

There is no activation after the 2nd Conv in each decoder block #63

Closed JiamingJiao closed 9 months ago

JiamingJiao commented 9 months ago

https://github.com/keyu-tian/SparK/blob/00883b885dce68413ea048c6703a82d8a67a83b8/pretrain/decoder.py#L26C10-L26C10 I would like to know if I missed something. Is it a typo or designed for some reasons? Thanks :D

keyu-tian commented 9 months ago

@JiamingJiao yes it is by design. I refer to the MobileNet v2 architecture where they recommend for less activations. But i just found some unet implementation uses the second act. I haven't tried this but maybe it can be better than the current decoder.