Closed varadgunjal closed 1 year ago
Hi, the only rate limit is "if too many people call it then it doesn't work" :)
If you want to do a lot of queries, I advise you download the index and run things locally so you have the whole thing for yourself. It's available at huggingface
Thank you for the prompt reply! Just to be sure : you are referring to the index here - https://huggingface.co/datasets/laion/laion5B-index - right?
Also are there any instructions / blogs that can help me set things up locally for fast queries? Can I feed this index to something like ElasticSearch and query from there?
Yes, you can use this by running clip-retrieval back as explained in the readme
Sorry to press on with further questions, but I wanted to clarify something about running clip-retrieval back
to do the same thing as the API (provide text query and get a bunch of caption + URLs)
I read on another issue that the output_folder
at https://github.com/rom1504/clip-retrieval#clip-back is the folder I download from HF. Is that just the image.index
folder listed on HF? Does it figure out what to do with .index
or .ivfdata
files by itself?
It should be the full folder from hf, all the files, with the same directory structure. Both index and metadata sub directory
On Thu, Dec 8, 2022, 05:57 Varad Gunjal @.***> wrote:
Sorry to press on with further questions, but I wanted to clarify something about running clip-retrieval back to do the same thing as the API (provide text query and get a bunch of caption + URLs)
I read on another issue that the output_folder at https://github.com/rom1504/clip-retrieval#clip-back is the folder I download from HF. Is that just the image.index folder listed on HF? Does it figure out what to do with .index or .ivfdata files by itself?
— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/211#issuecomment-1342037147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437XCMXJL3PS3A4VFUODWMFTEZANCNFSM6AAAAAASXFBKYM . You are receiving this because you commented.Message ID: @.***>
If you want to document the exact steps you took and pull request it as a .md it would be great Will be useful for next people asking the same thing
On Thu, Dec 8, 2022, 16:36 Varad Gunjal @.***> wrote:
Closed #211 https://github.com/rom1504/clip-retrieval/issues/211 as completed.
— Reply to this email directly, view it on GitHub https://github.com/rom1504/clip-retrieval/issues/211#event-7990312006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437TN7NBBBWTKL24WDTTWMH57NANCNFSM6AAAAAASXFBKYM . You are receiving this because you commented.Message ID: @.***>
For sure, I'll do that and send a PR today. Thank you for all the work you've put in and please let me know if there are any beginner-friendly open issues / features that I can help with. I'm quite familiar with the theory & papers in this space and am trying to get more familiar with the codes that are widely used by the community.
Hi, when trying the instructions here and in #214, I get a failure at
load_index(clip_options.indice_folder + "/image.index", clip_options.enable_faiss_memory_mapping)
in clip_back.py
:
*** RuntimeError: Error in faiss::Index* faiss::read_index(faiss::IOReader*, int) at /project/faiss/faiss/impl/index_read.cpp:527: Error: 'ret == (1)' failed: read error in /data2/laion5B/laion5B-index/image.index: 0 != 1 (Is a directory)
,
The implication seems to be that image.index
should not have been a directory, even though on hf, it is. I have the data from hf with the correct directory structure at /data2/laion5B/laion5B-index/
.
Any advice?
What command line did you run exactly?
Thanks for the quick reply. I ran clip-retrieval back --port 1234 --indices-paths indices_paths.json
on its own first, and then started including the various flags that have been suggested in related issues here:
--enable_hdf5 False --use_arrow True --clip_model "ViT-L/14"
I advise you use the config mentioned in https://github.com/rom1504/clip-retrieval#clip-back
Thank you -- confirming that this worked.
great
out of curiosity, what are you doing with it @ptsividis-csm ?
I'm just looking into how easily I can generate a more curated subset of LAION5B, no concrete plan of what to do with that yet.
I'm curious to know what the rate limit on the API calls to
clip-retrieval
is? I've noticed that a single query takes ~30s to return image results and am trying to speed up downloads on a list of queries I've collected by making multiple parallel calls but I don't want to overwhelm the service / get flagged.