rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.42k stars 213 forks source link

Rate Limit on API #211

Closed varadgunjal closed 1 year ago

varadgunjal commented 1 year ago

I'm curious to know what the rate limit on the API calls to clip-retrieval is? I've noticed that a single query takes ~30s to return image results and am trying to speed up downloads on a list of queries I've collected by making multiple parallel calls but I don't want to overwhelm the service / get flagged.

rom1504 commented 1 year ago

Hi, the only rate limit is "if too many people call it then it doesn't work" :)

If you want to do a lot of queries, I advise you download the index and run things locally so you have the whole thing for yourself. It's available at huggingface

varadgunjal commented 1 year ago

Thank you for the prompt reply! Just to be sure : you are referring to the index here - https://huggingface.co/datasets/laion/laion5B-index - right?

Also are there any instructions / blogs that can help me set things up locally for fast queries? Can I feed this index to something like ElasticSearch and query from there?

rom1504 commented 1 year ago

Yes, you can use this by running clip-retrieval back as explained in the readme

varadgunjal commented 1 year ago

Sorry to press on with further questions, but I wanted to clarify something about running clip-retrieval back to do the same thing as the API (provide text query and get a bunch of caption + URLs)

I read on another issue that the output_folder at https://github.com/rom1504/clip-retrieval#clip-back is the folder I download from HF. Is that just the image.index folder listed on HF? Does it figure out what to do with .index or .ivfdata files by itself?

rom1504 commented 1 year ago

It should be the full folder from hf, all the files, with the same directory structure. Both index and metadata sub directory

On Thu, Dec 8, 2022, 05:57 Varad Gunjal @.***> wrote:

Sorry to press on with further questions, but I wanted to clarify something about running clip-retrieval back to do the same thing as the API (provide text query and get a bunch of caption + URLs)

I read on another issue that the output_folder at https://github.com/rom1504/clip-retrieval#clip-back is the folder I download from HF. Is that just the image.index folder listed on HF? Does it figure out what to do with .index or .ivfdata files by itself?

— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/211#issuecomment-1342037147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437XCMXJL3PS3A4VFUODWMFTEZANCNFSM6AAAAAASXFBKYM . You are receiving this because you commented.Message ID: @.***>

rom1504 commented 1 year ago

If you want to document the exact steps you took and pull request it as a .md it would be great Will be useful for next people asking the same thing

On Thu, Dec 8, 2022, 16:36 Varad Gunjal @.***> wrote:

Closed #211 https://github.com/rom1504/clip-retrieval/issues/211 as completed.

— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/211#event-7990312006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437TN7NBBBWTKL24WDTTWMH57NANCNFSM6AAAAAASXFBKYM . You are receiving this because you commented.Message ID: @.***>

varadgunjal commented 1 year ago

For sure, I'll do that and send a PR today. Thank you for all the work you've put in and please let me know if there are any beginner-friendly open issues / features that I can help with. I'm quite familiar with the theory & papers in this space and am trying to get more familiar with the codes that are widely used by the community.

varadgunjal commented 1 year ago

214

ptsividis-csm commented 1 year ago

Hi, when trying the instructions here and in #214, I get a failure at load_index(clip_options.indice_folder + "/image.index", clip_options.enable_faiss_memory_mapping) in clip_back.py:

*** RuntimeError: Error in faiss::Index* faiss::read_index(faiss::IOReader*, int) at /project/faiss/faiss/impl/index_read.cpp:527: Error: 'ret == (1)' failed: read error in /data2/laion5B/laion5B-index/image.index: 0 != 1 (Is a directory),

The implication seems to be that image.index should not have been a directory, even though on hf, it is. I have the data from hf with the correct directory structure at /data2/laion5B/laion5B-index/.

Any advice?

rom1504 commented 1 year ago

What command line did you run exactly?

ptsividis-csm commented 1 year ago

Thanks for the quick reply. I ran clip-retrieval back --port 1234 --indices-paths indices_paths.json on its own first, and then started including the various flags that have been suggested in related issues here:

--enable_hdf5 False --use_arrow True --clip_model "ViT-L/14"

rom1504 commented 1 year ago

I advise you use the config mentioned in https://github.com/rom1504/clip-retrieval#clip-back

ptsividis-csm commented 1 year ago

Thank you -- confirming that this worked.

rom1504 commented 1 year ago

great

out of curiosity, what are you doing with it @ptsividis-csm ?

ptsividis-csm commented 1 year ago

I'm just looking into how easily I can generate a more curated subset of LAION5B, no concrete plan of what to do with that yet.