locuslab / convmixer

Implementation of ConvMixer for "Patches Are All You Need? 🤷"
MIT License
1.06k stars 99 forks source link

Request more experiment results to compare to other architecture. #8

Open Luciennnnnnn opened 2 years ago

Luciennnnnnn commented 2 years ago

Hi! This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.

BradKML commented 2 years ago

This sounds like a good idea, but it requires standard benchmarks and model zoos.