Closed ryanccarelli closed 2 years ago
3-layer MLP (2048 hidden dim) with GELU, last layer without GELU l2 norm weight normalized fully connected layer no batchnorm
https://github.com/facebookresearch/dino/blob/cb711401860da580817918b9167ed73e3eef3dcf/vision_transformer.py#L257
3-layer MLP (2048 hidden dim) with GELU, last layer without GELU l2 norm weight normalized fully connected layer no batchnorm
https://github.com/facebookresearch/dino/blob/cb711401860da580817918b9167ed73e3eef3dcf/vision_transformer.py#L257