jzhoubu / vsearch

An Extensible Framework for Retrieval-Augmented LLM Applications: Learning Relevance Beyond Simple Similarity.
MIT License
42 stars 1 forks source link

[Feature]: Use torch tensor to save sparse index #7

Closed jzhoubu closed 4 months ago

jzhoubu commented 5 months ago

Goals

To streamline the index managemen by unifying the index types (to torch csr tensor) and the save & load functions.

Trade-off

The current .npz format from scipy is advantageous for its compression with minimal disk usage. Transition from scipy .npz format to the pytorch .pt comes with larger disk occupation and faster loading speed.

We will provide an option to choose between the .npz and .pt formats.

jzhoubu commented 4 months ago

We have decided to continue using the scipy .npz format for storing sparse indices. While we utilize torch sparse tensors for inner product operations, storing sparse indices in this format presents two significant issues: