zilliztech / VectorDBBench

A Benchmark Tool for VectorDB
MIT License
558 stars 151 forks source link

Improve downloading files logic to check if a file already exists #405

Open ek-nyc opened 1 week ago

ek-nyc commented 1 week ago

When I rerun my tests, it takes a long time to download the 3 files that already have been downloaded. The downloading steps should be skipped if the files already exist.

alwayslove2013 commented 1 week ago

@ek-nyc could you please provide more detailed information?

Currently, there is a logic in VectorDBBench to skip file downloads. If we detect that a file with the same name has the same size, we will skip the download. https://github.com/zilliztech/VectorDBBench/blob/1ab46dd5d1594565148f8b90cc75b71ff11688e1/vectordb_bench/backend/data_source.py#L57-L66

Additionally, please note that the default download location is the /tmp folder, which is typically cleared upon system reboot.