Retrieval speedup - Githubissues

naver / bergen

Benchmarking library for RAG

Other

122 stars 12 forks source link

Closed davidmrau closed 5 months ago

davidmrau commented 5 months ago

the idea is to first load all embeddings into CPU memory and then load each chunk to GPU when it is multiplied with the quert chunk.