openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
26.19k stars 3.35k forks source link

Make a smaller ModifiedResnet model #378

Open justlike-prog opened 1 year ago

justlike-prog commented 1 year ago

Hi,

The R50 image encoder is around 250MB in size. Was anyone here able to reduce the size drastically (to around 20MB)? I was thinking about doing knowledge distillation with a student model that has also ModifiedResnet architecture, but with fewer layers, filters and so on.

Any ideas and experiences are welcome :)