zilliztech / VectorDBBench

A Benchmark Tool for VectorDB
MIT License
561 stars 151 forks source link

Unable to download the datasets #411

Open svim-ig opened 1 day ago

svim-ig commented 1 day ago

image Unable to download the datasets mentioned in repo ...same issue with GIST and Cohere datasets...share the download links for the datasets utilized in repo for benchmark testing

alwayslove2013 commented 14 hours ago

@svim-ig After selecting the test case, VectorDBBench will automatically download the required dataset, so there is no need to download it in advance.

svim-ig commented 12 hours ago

@alwayslove2013 As of now, I am not going to use the datasets in vectordb benchmark tool... I would like to download and utilize the datasets for performance evaluation of my llm models and vectordb ...hence required the datasets separately

alwayslove2013 commented 10 hours ago

@svim-ig The dataset used by VectorDBBench is derived from publicly available datasets (excluding OpenAI). I recommend using the original datasets for testing. You can easily find download links online, such as on Hugging Face.

The download links for theVectorDBBench datasets can be somewhat complex.

The basic format is: [data_source]/benchmark/[dataset_dir]/[file_type]

For example: assets.zilliz.com/benchmark/openai_medium_500k/shuffle_train.parquet

The data_source is categorized by region, you can choose based on your network conditions.

Common dataset_dir categories include:

The file_type is divided into three categories: