ViT Hybrid Pretrained models

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

https://huggingface.co/docs/timm

Apache License 2.0

32.14k stars 4.76k forks source link

ViT Hybrid Pretrained models #271

Closed skrish13 closed 3 years ago

skrish13 commented 4 years ago

Thanks for making ViT models available! Do you plan to opensource the hybrid models as well? If so, any idea on when it will be available by?

rwightman commented 4 years ago

I don't have any active training going on for one of these models. Perhaps in the future depending on GPU resources available. If anyone with some free GPUs interested help is always appreciated. It'll take roughly 4-6 days for the smaller ones on 2 Titan RTX class GPU.

skrish13 commented 4 years ago

Sure, I can help. Will take more time since I dont have 2 GPUs. Should I get started with using your train.py?

skrish13 commented 4 years ago

On the other hand -- https://github.com/google-research/vision_transformer/issues/22

rwightman commented 4 years ago

Heh, by the time you train one from scratch they'll have a much better one ... maybe good to wait and see how theirs turn out. Although it may be difficult to support in PyTorch if they use a ResNet backbone that's different from the 'v1b' style common in PyTorch. I haven't supported the ResNet v2 variation that's common in Tensorflow.

On Fri, Nov 6, 2020 at 11:38 AM Sri Krishna notifications@github.com wrote:

On the other hand -- google-research/vision_transformer#22 https://github.com/google-research/vision_transformer/issues/22

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rwightman/pytorch-image-models/issues/271#issuecomment-723262880, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLQICADJEB2WUFTQ4WFFPDSORGCPANCNFSM4TJZME3Q .

skrish13 commented 4 years ago

Yes, I think they use ResNet v2, which they used in their previous BiT paper. They had provided the pytorch models for them, maybe useful -- https://github.com/google-research/big_transfer/blob/master/bit_pytorch/models.py

Lin-Zhipeng commented 3 years ago

Hi，they have added the ViT Hybrid Pretrained models in Adds R50+ViT-B/16 model .Can you transfer format to this project model vision_transformer.py#L75

rwightman commented 3 years ago

@Lin-Zhipeng I've seen it, but I don't currently have the ResNet blocks it's built on implemented here... so I'd likely implement the BiT version of ResNet first... which has GroupNorm and some other minor differences (closer to a TF 'v2' ResNet) than the ResNets here.

Lin-Zhipeng commented 3 years ago

@Lin-Zhipeng I've seen it, but I don't currently have the ResNet blocks it's built on implemented here... so I'd likely implement the BiT version of ResNet first... which has GroupNorm and some other minor differences (closer to a TF 'v2' ResNet) than the ResNets here.

Thanks for your reply. Looking forward the update. 😄

junyongyou commented 3 years ago

Any update on this issue? Thanks a lot.

rwightman commented 3 years ago

@junyongyou @Lin-Zhipeng started working on it now, along with fixing the imagenet21k models (wrong resolution, pre_logits support) and some support to run imagenet21k validation

rwightman commented 3 years ago

@junyongyou @Lin-Zhipeng FYI, these models are working on my branch right now, but I have some more work/testing to do before I can merge to master

https://github.com/rwightman/pytorch-image-models/tree/imagenet21k_datasets_more

In addition to the official Hybrid R50 model working, I updated the imagenet21k weights to include the representation layer and made proper model defs for them, unfortunately the official jax models have zero'd out the 21k head weights so they don't actually work to use as is (but fine for finetune). I also included the Big Transfer R50 v2 models (BiT) since there was overlap with the ViT R50 backbone. Those do have working 21k heads.

Lin-Zhipeng commented 3 years ago

@junyongyou @Lin-Zhipeng FYI, these models are working on my branch right now, but I have some more work/testing to do before I can merge to master

https://github.com/rwightman/pytorch-image-models/tree/imagenet21k_datasets_more

In addition to the official Hybrid R50 model working, I updated the imagenet21k weights to include the representation layer and made proper model defs for them, unfortunately the official jax models have zero'd out the 21k head weights so they don't actually work to use as is (but fine for finetune). I also included the Big Transfer R50 v2 models (BiT) since there was overlap with the ViT R50 backbone. Those do have working 21k heads.

Thanks a lot. I will try it.

rwightman commented 3 years ago

merged