hammoudhasan / SynthCLIP

Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
88 stars 1 forks source link

clip-benchmark model init #6

Open escorciav opened 2 weeks ago

escorciav commented 2 weeks ago

Hi guys!

Thanks for making your research accessible to the public & congrats on your CVPRW-2024 paper :tada:

Is this the boilerplate required to plugin SynthCLIP in clip-bench as mentioned in #5 or #2 ?

cp Training/models.py <clip-benchmark-dir/clip_benchmark/models/synthclip.py>

Append this function onto that module

def load_synthclip(pretrained: str = "./checkpoints/synthclip-30m/checkpoint_best.pt",
                   device="cpu", **kwargs):
    model = CLIP_VITB16()
    # Taken from
    # https://github.com/hammoudhasan/SynthCLIP/blob/02ef69764d8dc921650bcac4a98bd0f477790787/Training/main.py#L240
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
    )
    transform = transforms.Compose(
        [
            transforms.Resize(224),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            # dunno why I need that but whatever XD. EOM - Victor
            lambda x: x.repeat(3, 1, 1) if x.shape[0] == 1 else x,  # force RGB
            normalize,
        ]
    )
    model = model.to(device)
    tokenizer = open_clip.get_tokenizer("ViT-B-16")
    return model, transform, tokenizer

then register it as mentioned here

Thanks in advance!

escorciav commented 2 weeks ago

Worked for me (I believe). If needed, check my fork of clip-bench out :wink: . Your welcome!

clip_benchmark eval --model "ViT-B-16" --model_type synthclip --pretrained $pretrained --dataset=$dataset --output=$output --dataset_root $dataset_root
# Debugging
# python -m ipdb clip-benchmark/clip_benchmark/cli.py eval --model_type synthclip --pretrained $pretrained --dataset=$dataset --output=$output --dataset_root $dataset_root --num_workers 0
escorciav commented 3 days ago

In case anyone is interested