This PR adds a 2 million size embedding dataset of 1536-dimensional OpenAI ada-002 embeddings of the abstracts of ArXiv papers. The original ArXiv dataset was released by Cornell University on kaggle under a CC0 license. We provide a set of 20000 queries also embedded from the abstracts of ArXiv articles, as well as groundtruth for the first 100000 vectors and the full 2321096 vectors.
This PR adds a 2 million size embedding dataset of 1536-dimensional OpenAI ada-002 embeddings of the abstracts of ArXiv papers. The original ArXiv dataset was released by Cornell University on kaggle under a CC0 license. We provide a set of 20000 queries also embedded from the abstracts of ArXiv articles, as well as groundtruth for the first 100000 vectors and the full 2321096 vectors.