rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.42k stars 213 forks source link

How can I extract the img and corresponding caption feature tensor efficiently on my own dataset and apply the project? #320

Closed ShuxunoO closed 11 months ago

ShuxunoO commented 12 months ago

Hello, the project is a really great work!

I built a custom img-txt pair dataset,including 7.45Million paires, and have finetuned CLIP-B-32, CLIP-L-14 model on it.

Now, I want to use the framework to search the img by query caption. My question is :

**1. How can I extract the img and corresponding caption feature tensor efficiently?

  1. How can I build the CLIP index?
  2. How can I adjust the project to apply my data?**

Is there any tutorial or reference link?

Thank you for your help!

rom1504 commented 12 months ago

One way you can do this with no code change is serve your images over http for example with nginx , build a dataset of urls pointing to your nginx, then you can use the whole pipeline, from img2dataset to clip retrieval inference, index then the back

rom1504 commented 12 months ago

For supporting custom clip checkpoint, I think we'd need a new option, maybe you can add it ?

ShuxunoO commented 12 months ago

A new issue to talk about “How to load the custom clip checkpoint” or A new function I need to write by myself?

rom1504 commented 12 months ago

A new feature you could contribute yes. I don't think it's a lot of work. Openclip already supports it so it's really only a matter of passing through the path from a clip retrieval option to openclip

ShuxunoO commented 12 months ago

Yes, just need to replace these sentences

    device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
    pretrained_model_path = "path/to/your/finetuned_model.pth"
    model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained=pretrained_model_path, device=device)
    tokenizer = open_clip.get_tokenizer('ViT-B-32')
    model.to(device)

When I finish my thesis before the deadline I think I will

rom1504 commented 11 months ago

it is added now

IF-chan commented 8 months ago

Yes, just need to replace these sentences

    device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
    pretrained_model_path = "path/to/your/finetuned_model.pth"
    model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained=pretrained_model_path, device=device)
    tokenizer = open_clip.get_tokenizer('ViT-B-32')
    model.to(device)

When I finish my thesis before the deadline I think I will

where to replace it?