Closed Warmchay closed 5 months ago
Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:
- Change
config_datasets.sh
as mime, and copyconfig_sample.sh
toconfig_local.sh
- Run
./run_benchmark.sh debug build
, from here, I don't know runbuild
firstly or runbuild_mem
or others- Run
./run_benchmark.sh debug build_mem
- Run
./run_benchmark.sh debug search knn
, it warns me that it didn't has partition files, so I run./run_benchmark.sh debug gp
, but it warns me that:./run_benchmark.sh: line 146: ../debug/graph_partition/partitioner: No such file or directory
I'm not sure am I right, could you help me figure it out?
You might need to run in this order
./run_benchmark.sh debug build
./run_benchmark.sh debug build_mem
./run_benchmark.sh debug gp
to do the graph partition. The error you encountered, I guess, is missing running git submodule update --recursive --init
under the path of this repository. Hence, the directory under graph_partition
is empty, and the partitioner
binary is not compiled so it could not find it../run_benchmark.sh debug search knn
Just a quick note that the performance will be a few magnitudes worse if you run under Debug mode compared to Release.
Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:
- Change
config_datasets.sh
as mime, and copyconfig_sample.sh
toconfig_local.sh
- Run
./run_benchmark.sh debug build
, from here, I don't know runbuild
firstly or runbuild_mem
or others- Run
./run_benchmark.sh debug build_mem
- Run
./run_benchmark.sh debug search knn
, it warns me that it didn't has partition files, so I run./run_benchmark.sh debug gp
, but it warns me that:./run_benchmark.sh: line 146: ../debug/graph_partition/partitioner: No such file or directory
I'm not sure am I right, could you help me figure it out?
You might need to run in this order
./run_benchmark.sh debug build
./run_benchmark.sh debug build_mem
./run_benchmark.sh debug gp
to do the graph partition. The error you encountered, I guess, is missing runninggit submodule update --recursive --init
under the path of this repository. Hence, the directory undergraph_partition
is empty, and thepartitioner
binary is not compiled so it could not find it../run_benchmark.sh debug search knn
Just a quick note that the performance will be a few magnitudes worse if you run under Debug mode compared to Release.
Hi, thanks for your forward help, I have solved submodules problems, but I encounter other problem, which about missing gp files:
./run_benchmark.sh debug gp
Running graph partition... _M32_R48_L128_B/GP_TIMES_16_LOCK_0_GP_USE_FREQ0_CUT4096/_part.bin.log
the option '--gp_file' is required but missing
run_benchmark.sh
, in the gp segment, it has point the --gp_file
to GP_FILE_PATH
, which constituted as ${GP_PATH}_part.bin
, and I found this log bin file in indices
, but I don't know why it couldn't find it.
It readlly confuses me, could you help to explain it? Thanks a lot!Can you uncomment this https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/run_benchmark.sh#L4 and run the script again?
After changing into set -x
, the output is by running debug build
mode:
+ popd
/home/wq/code/starling/scripts
+ mkdir -p ../indices
+ cd ../indices
+ date
Thu Jun 27 06:36:40 AM UTC 2024
+ case $2 in
+ check_dir_and_make_if_absent _M32_R48_L128_B/
+ local dir=_M32_R48_L128_B/
+ '[' -d _M32_R48_L128_B/ ']'
+ mkdir -p _M32_R48_L128_B/
+ echo 'Building disk index...'
Building disk index...
+ ../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8 --search_DRAM_budget 0
the required argument for option '--search_DRAM_budget' is missing
real 0m0.035s
user 0m0.004s
sys 0m0.009s
+ cp _M32_R48_L128_B/_disk.index _M32_R48_L128_B/_disk_beam_search.index
cp: cannot stat '_M32_R48_L128_B/_disk.index': No such file or directory
Maybe the build
didn't run smoothly, that makes the _disk.index is incomplete.
I add the --search_DRAM_budget
as $CACHE
in https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/config_sample.sh#L58, but it didn't work either.
After changing into
set -x
, the output is by runningdebug build
mode:+ popd /home/wq/code/starling/scripts + mkdir -p ../indices + cd ../indices + date Thu Jun 27 06:36:40 AM UTC 2024 + case $2 in + check_dir_and_make_if_absent _M32_R48_L128_B/ + local dir=_M32_R48_L128_B/ + '[' -d _M32_R48_L128_B/ ']' + mkdir -p _M32_R48_L128_B/ + echo 'Building disk index...' Building disk index... + ../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8 --search_DRAM_budget 0 the required argument for option '--search_DRAM_budget' is missing real 0m0.035s user 0m0.004s sys 0m0.009s + cp _M32_R48_L128_B/_disk.index _M32_R48_L128_B/_disk_beam_search.index cp: cannot stat '_M32_R48_L128_B/_disk.index': No such file or directory
Maybe the
build
didn't run smoothly, that makes the _disk.index is incomplete.
Can you show me your config_dataset.sh
, it looks like you didn't provide data type, dist function and others, as shown in
../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8
Please strictly follow https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/config_dataset.sh#L13 and provide all fields.
My config_dataset.sh
as shown:
BASE_PATH=/data1/wq/bigann/diskann_100M/bigann_base_100M.fbin
QUERY_FILE=/data1/wq/bigann/diskann_100M/bigann_query.fbin
GT_FILE=/data/datasets/BIGANN/bigann_gt_100M
PREFIX=bigann_100m
DATA_TYPE=float
DIST_FN=l2
B=0.3
K=10
DATA_DIM=128
DATA_N=100000000
}
Oops, I think it might have these problems:
float
, I'm not sure whether starling provides it or not..fbin
files provided by DiskANN's convert_fvecs_to_fbin
tool, by the way, I haven't found your testing dataset in sift website, could you instruct me how to get your dataset?Really thanks to your help and collaborate replies!
My
config_dataset.sh
as shown:BASE_PATH=/data1/wq/bigann/diskann_100M/bigann_base_100M.fbin QUERY_FILE=/data1/wq/bigann/diskann_100M/bigann_query.fbin GT_FILE=/data/datasets/BIGANN/bigann_gt_100M PREFIX=bigann_100m DATA_TYPE=float DIST_FN=l2 B=0.3 K=10 DATA_DIM=128 DATA_N=100000000 }
Oops, I think it might have these problems:
- I enlarge bigann 100M byte dataset to float dataset, and I change data field as
float
, I'm not sure whether starling provides it or not.- The
.fbin
files provided by DiskANN'sconvert_fvecs_to_fbin
tool, by the way, I haven't found your testing dataset in sift website, could you instruct me how to get your dataset?Really thanks to your help and collaborate replies!
float
is supported.Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?
Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?
Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?
Starling uses the same build process as DiskANN, which utilise build_disk_index
function, if it can be built by DiskANN then it should be able to run Starling build process.
Thanks your lasting help! I think I don't have any problem, but I need to check my process is followed right or not, if I have problems in another angles, I will re-open the issues again, from now on, I should examine myself firstly.
Really thanks to your help again, you explain my matters really clearly!
Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:
config_datasets.sh
as mime, and copyconfig_sample.sh
toconfig_local.sh
./run_benchmark.sh debug build
, from here, I don't know runbuild
firstly or runbuild_mem
or others./run_benchmark.sh debug build_mem
./run_benchmark.sh debug search knn
, it warns me that it didn't has partition files, so I run./run_benchmark.sh debug gp
, but it warns me that:I'm not sure am I right, could you help me figure it out?