Closed varadgunjal closed 1 year ago
I followed the instructions at https://github.com/rom1504/clip-retrieval#clip-back after manually downloading the indices (The load_dataset
or git lfs
options on HF Datasets did not work for me, so I resorted to creating the dirs + subdirs myself and using wget
for individual files).
My indices_path.json
looks like so - {"laion5B-index": "/mnt/data/laion-5b-index"}
I'm running back using the command - clip-retrieval back --port 1234 --indices-paths indices_paths.json --enable_mclip_option False --enable_faiss_memory_mapping True --enable_hdf5 True
Ah yes, you need enable hdf5 false and enable arrow true
And you need to tell it to use the right model That's L/14
--clip_model "ViT-L/14"
--enable_hdf5 False --use_arrow True
You can also directly put this in the JSON file
{
"laion5B": {
"indice_folder": "/mnt/laion5B/prepared_data",
"provide_safety_model": true,
"enable_faiss_memory_mapping": true,
"use_arrow": true,
"enable_hdf5": false,
"reorder_metadata_by_ivf_index": false,
"columns_to_return": ["url", "caption"],
"clip_model": "ViT-L/14",
"enable_mclip_option": false
}
}
Adapt the oath
Thanks! I should've mentioned I did add the --clip_model "ViT-L/14" option - I didn't paste my command correctly earlier. I also tried using --use_arrow True but it was taking a lot of time to load and get the flash app running so I switched back to hdf5. Any idea why the difference in loading times?
(It does work and returns responses now)
Also, is it possible to get only the English results (perhaps by limiting the search to laion2B-en only)?
Since the file on hf are arrow, if you turned on hdf5 and it "worked" it probably means that the metadata was disabled
Also, is it possible to get only the English results (perhaps by limiting the search to laion2B-en only)?
It could be possible to rebuild an index for laion2b-en only
Another way is to keep only the results with an id between 1B and 3B (actually slightly different, open the arrow file to get the exact number) since the items are ordered in 1B-nolang 2B-en 2B-multi order
This was very helpful. Thank you!
When I try to query the locally running Flask app with a text query, I get the following stack trace -
I queried the service like so -
Any idea what I might be doing wrong?