naver / bergen

Benchmarking library for RAG
Other
122 stars 12 forks source link

Retrieval speedup #11

Closed davidmrau closed 5 months ago

davidmrau commented 5 months ago

the idea is to first load all embeddings into CPU memory and then load each chunk to GPU when it is multiplied with the quert chunk.