Closed davidmrau closed 5 months ago
the idea is to first load all embeddings into CPU memory and then load each chunk to GPU when it is multiplied with the quert chunk.
the idea is to first load all embeddings into CPU memory and then load each chunk to GPU when it is multiplied with the quert chunk.