zilliztech / VectorDBBench

A Benchmark Tool for VectorDB
MIT License
455 stars 109 forks source link

Is it possible to run this benchmark offline? #304

Open GreateFang opened 2 months ago

GreateFang commented 2 months ago

HI team, I would like to know if it possible to run this bench offline? I saw the dataset is pull from ALiYun, while my db can just run offline
and If I want to add a small dataset(just to see if the bench work normally ), what files should I focus on? I'm so new to python, it will be very appreciated if the answer have more detail Thanks for your reply!

alwayslove2013 commented 2 months ago

@GreateFang It's a bit complex. In general, you need to do the following two things:

First, you need to download the corresponding dataset based on the case you want to run and copy it to your intranet machine.

Then, disable the online data file validation feature. https://github.com/zilliztech/VectorDBBench/blob/main/vectordb_bench/backend/data_source.py

For online testing, these steps are automated. However, for offline testing, many configurations need to be set up.

alwayslove2013 commented 2 months ago

@GreateFang Tell us what case you wanna run. I will try to find the exact download URL.

GreateFang commented 2 months ago

@GreateFang Tell us what case you wanna run. I will try to find the exact download URL.

just a small dataset with less data(like 1K or 10K) and dims will be fine, which don't have to be realworld data since just to test the bench.

GreateFang commented 2 months ago

@GreateFang It's a bit complex. In general, you need to do the following two things:

First, you need to download the corresponding dataset based on the case you want to run and copy it to your intranet machine.

Then, disable the online data file validation feature. https://github.com/zilliztech/VectorDBBench/blob/main/vectordb_bench/backend/data_source.py

For online testing, these steps are automated. However, for offline testing, many configurations need to be set up.

ah... it seems may take a really lot of works... Thanks for your quick reply.

alwayslove2013 commented 2 months ago

@GreateFang The smallest is Cohere 768dim * 1M.

just a small dataset with less data(like 1K or 10K)

Could you tell us what you're planning? We will recently support users to test with customized local datasets, and we will give the specification and guidance. If it's useful to you guys the feature will prioritize.

... since just to test the bench.

GreateFang commented 2 months ago

@GreateFang The smallest is Cohere 768dim * 1M.

just a small dataset with less data(like 1K or 10K)

Could you tell us what you're planning? We will recently support users to test with customized local datasets, and we will give the specification and guidance. If it's useful to you guys the feature will prioritize.

... since just to test the bench.

we have develop a new vector db and try to rank the db , may need to edit the bench to adapt to our db, So I wonder if there have a small test dataset could get a quick result whether it is going well.

alwayslove2013 commented 2 months ago

@GreateFang Got it. We will prioritize local dataset support and small dataset examples.

we have develop a new vector db and try to rank the db , may need to edit the bench to adapt to our db, So I wonder if there have a small test dataset could get a quick result whether it is going well.