mlfoundations / open_clip

An open source implementation of CLIP.
Other
9.71k stars 950 forks source link

[bug] loading saved weights of an open_clip model does give back the same results #915

Closed yxchng closed 1 month ago

yxchng commented 1 month ago
import torch
from PIL import Image
import open_clip

import logging
import math
from typing import List, Tuple, Optional, Union

import torch
import torch.nn.functional as F

model, _, preprocess = open_clip.create_model_and_transforms('ViT-L-14-336', pretrained='openai')
model.eval()  # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
tokenizer = open_clip.get_tokenizer('ViT-L-14-336')

image = preprocess(Image.open("CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

torch.save(model.state_dict(), 'tmp.pt')

model, _, preprocess = open_clip.create_model_and_transforms('ViT-L-14-336', pretrained='tmp.pt')
model.eval()  # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
tokenizer = open_clip.get_tokenizer('ViT-L-14-336')

image = preprocess(Image.open("CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

gives

Label probs: tensor([[0.9326, 0.0627, 0.0047]])
Label probs: tensor([[0.8960, 0.0976, 0.0064]])
rwightman commented 1 month ago

It's #771 ... openai force overrides activaiton to quickgelu (which is less efficient and uses more memory than nn.GELU), need to manually force via argument to create to use afterwards (or use a model config with quick gelu)

yxchng commented 1 month ago

@rwightman how to manually force via argument?