How do you profile the CLIP models

mlfoundations / open_clip

An open source implementation of CLIP.

Other

9.14k stars 908 forks source link

Hi, I want to know how do you profile the CLIP models https://github.com/mlfoundations/open_clip/blob/main/docs/model_profile.csv. Becauce I can't match the profile results with tools that I tried (e.g. torchsummaryX, thop, and torchinfo). In fact, I got very different results. Among them, I think the closest result to the FLOPs plotted in the CLIP paper Learning Transferable Visual Models From Natural Language Supervision (figure below) is achieved by torchinfo, which is 14.04GFLOPs (multi-adds). I also tried the codes provided by @jongwook (https://github.com/openai/CLIP/issues/143#issuecomment-926327141). However, it gave a result of over 161GFLOPs. According to the model profile log provided by this repo, the computation complexity of CLIP with ViT-B/16 should be 41.09 GFLOPs.

What profile tools or library do you use to acquire this profile result? Kindly help me solving this problem.

mlfoundations / open_clip

How do you profile the CLIP models #902