rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.42k stars 213 forks source link

Errors when trying to query `clip_retrieval back` #212

Closed varadgunjal closed 1 year ago

varadgunjal commented 1 year ago

When I try to query the locally running Flask app with a text query, I get the following stack trace -

127.0.0.1 - - [08/Dec/2022 16:13:43] "POST /knn-service HTTP/1.1" 500 -
[2022-12-08 16:13:46,550] ERROR in app: Exception on /knn-service [POST]
Traceback (most recent call last):
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "<decorator-gen-2>", line 2, in post
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/prometheus_client/context_managers.py", line 81, in wrapped
    return func(*args, **kwargs)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/clip_retrieval/clip_back.py", line 488, in post
    return self.query(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/clip_retrieval/clip_back.py", line 451, in query
    distances, indices = self.knn_search(
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/clip_retrieval/clip_back.py", line 358, in knn_search
    distances, indices, embeddings = index.search_and_reconstruct(query, num_result_ids)
  File "/opt/conda/envs/pytorch/lib/python3.9/site-packages/faiss/class_wrappers.py", line 378, in replacement_search_and_reconstruct
    assert d == self.d
AssertionError

I queried the service like so -

payload = {
    "text":"red car",
    "modality":"image",
    "num_images":20,
    "indice_name":"laion5B-index",
    "use_mclip":False,
    "deduplicate":True,
    "use_safety_model":True,
    "use_violence_detector":True,
    "aesthetic_score":"",
    "aesthetic_weight":0.5
}

response = requests.post(
    "http://127.0.0.1:1234/knn-service",
    data=json.dumps(payload)
)

Any idea what I might be doing wrong?

varadgunjal commented 1 year ago

I followed the instructions at https://github.com/rom1504/clip-retrieval#clip-back after manually downloading the indices (The load_dataset or git lfs options on HF Datasets did not work for me, so I resorted to creating the dirs + subdirs myself and using wget for individual files).

My indices_path.json looks like so - {"laion5B-index": "/mnt/data/laion-5b-index"}

I'm running back using the command - clip-retrieval back --port 1234 --indices-paths indices_paths.json --enable_mclip_option False --enable_faiss_memory_mapping True --enable_hdf5 True

rom1504 commented 1 year ago

Ah yes, you need enable hdf5 false and enable arrow true

rom1504 commented 1 year ago

And you need to tell it to use the right model That's L/14

rom1504 commented 1 year ago

--clip_model "ViT-L/14"

rom1504 commented 1 year ago

--enable_hdf5 False --use_arrow True

rom1504 commented 1 year ago

You can also directly put this in the JSON file

{
        "laion5B": {
                "indice_folder": "/mnt/laion5B/prepared_data",
                "provide_safety_model": true,
                "enable_faiss_memory_mapping": true,
                "use_arrow": true,
                "enable_hdf5": false,
                "reorder_metadata_by_ivf_index": false,
                "columns_to_return": ["url", "caption"],
                "clip_model": "ViT-L/14",
                "enable_mclip_option": false
        }
}

Adapt the oath

varadgunjal commented 1 year ago

Thanks! I should've mentioned I did add the --clip_model "ViT-L/14" option - I didn't paste my command correctly earlier. I also tried using --use_arrow True but it was taking a lot of time to load and get the flash app running so I switched back to hdf5. Any idea why the difference in loading times?

(It does work and returns responses now)

varadgunjal commented 1 year ago

Also, is it possible to get only the English results (perhaps by limiting the search to laion2B-en only)?

rom1504 commented 1 year ago

Since the file on hf are arrow, if you turned on hdf5 and it "worked" it probably means that the metadata was disabled

rom1504 commented 1 year ago

Also, is it possible to get only the English results (perhaps by limiting the search to laion2B-en only)?

It could be possible to rebuild an index for laion2b-en only

Another way is to keep only the results with an id between 1B and 3B (actually slightly different, open the arrow file to get the exact number) since the items are ordered in 1B-nolang 2B-en 2B-multi order

varadgunjal commented 1 year ago

This was very helpful. Thank you!