rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.41k stars 211 forks source link

Investigate what format to use to store embeddings+id #128

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

Current format :

Numpy+parquet : Benefit:

Drawback:

Parquet with embeddings : Benefit:

Drawback:

What alternative exist to store embeddings+id ?

rom1504 commented 2 years ago

I may give parquet with embeddings one more try to check if it's really that slow

Alternative formats i can think of:

Direction to look into: