[FEATURE] Add ViT weights: RADIO

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Apache License 2.0

31.7k stars 4.71k forks source link

https://github.com/NVlabs/RADIO

The code and model weights of paper [CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One has been released by Nvidia

RADIO , a new vision foundation model (actually a new vit pretrained weight), excels across visual domains, serving as a superior replacement for vision backbones. Integrating CLIP variants, DINOv2, and SAM through distillation, it preserves unique features like text grounding and segmentation correspondence.

huggingface / pytorch-image-models

[FEATURE] Add ViT weights: RADIO #2177