huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
https://huggingface.co/docs/timm
Apache License 2.0
31.88k stars 4.73k forks source link

[FEATURE] Add model weights from Bamboo #1944

Open NightMachinery opened 1 year ago

NightMachinery commented 1 year ago

https://github.com/zhangyuanhan-ai/bamboo has released some interesting models, e.g., Bamboo-CLS ResNet-50 and Bamboo-CLS ViT B/16.

The ViT model beats most base ViTs I have seen in challenging datasets such as Objectnet.

rwightman commented 1 year ago

@NightMachinery thanks, looks like it'd just be adding two sets of pretrained weights, resnet50 should be compatible already as it's torchvision compat, but the torchvision vit needs a slight remapping to timm... I'll take a closer look soon

Laurent2916 commented 10 months ago

It looks like the ViT's state_dict simply has a "module." prefix on every key. Here is a simple notebook to convert the weights: https://colab.research.google.com/drive/1s9tohf-CncnYo_y7Wdga5Tvd3bylEHOX