zilliztech / starling

Other
41 stars 14 forks source link

[Question] How to run benchmark in order? #26

Closed Warmchay closed 5 months ago

Warmchay commented 5 months ago

Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:

  1. Change config_datasets.sh as mime, and copy config_sample.sh to config_local.sh
  2. Run ./run_benchmark.sh debug build, from here, I don't know run build firstly or run build_mem or others
  3. Run ./run_benchmark.sh debug build_mem
  4. Run ./run_benchmark.sh debug search knn, it warns me that it didn't has partition files, so I run ./run_benchmark.sh debug gp, but it warns me that:
    ./run_benchmark.sh: line 146: ../debug/graph_partition/partitioner: No such file or directory

    I'm not sure am I right, could you help me figure it out?

PwzXxm commented 5 months ago

Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:

  1. Change config_datasets.sh as mime, and copy config_sample.sh to config_local.sh
  2. Run ./run_benchmark.sh debug build, from here, I don't know run build firstly or run build_mem or others
  3. Run ./run_benchmark.sh debug build_mem
  4. Run ./run_benchmark.sh debug search knn, it warns me that it didn't has partition files, so I run ./run_benchmark.sh debug gp, but it warns me that:
./run_benchmark.sh: line 146: ../debug/graph_partition/partitioner: No such file or directory

I'm not sure am I right, could you help me figure it out?

You might need to run in this order

  1. ./run_benchmark.sh debug build
  2. ./run_benchmark.sh debug build_mem
  3. ./run_benchmark.sh debug gp to do the graph partition. The error you encountered, I guess, is missing running git submodule update --recursive --init under the path of this repository. Hence, the directory under graph_partition is empty, and the partitioner binary is not compiled so it could not find it.
  4. ./run_benchmark.sh debug search knn

Just a quick note that the performance will be a few magnitudes worse if you run under Debug mode compared to Release.

Warmchay commented 5 months ago

Hi, thanks for your work, it's very impressing. While I'm trying to reproduce your work, it functions not well, the exectued orders are:

  1. Change config_datasets.sh as mime, and copy config_sample.sh to config_local.sh
  2. Run ./run_benchmark.sh debug build, from here, I don't know run build firstly or run build_mem or others
  3. Run ./run_benchmark.sh debug build_mem
  4. Run ./run_benchmark.sh debug search knn, it warns me that it didn't has partition files, so I run ./run_benchmark.sh debug gp, but it warns me that:
./run_benchmark.sh: line 146: ../debug/graph_partition/partitioner: No such file or directory

I'm not sure am I right, could you help me figure it out?

You might need to run in this order

  1. ./run_benchmark.sh debug build
  2. ./run_benchmark.sh debug build_mem
  3. ./run_benchmark.sh debug gp to do the graph partition. The error you encountered, I guess, is missing running git submodule update --recursive --init under the path of this repository. Hence, the directory under graph_partition is empty, and the partitioner binary is not compiled so it could not find it.
  4. ./run_benchmark.sh debug search knn

Just a quick note that the performance will be a few magnitudes worse if you run under Debug mode compared to Release.

Hi, thanks for your forward help, I have solved submodules problems, but I encounter other problem, which about missing gp files:

  1. After ./run_benchmark.sh debug gp
  2. It shows that
    Running graph partition... _M32_R48_L128_B/GP_TIMES_16_LOCK_0_GP_USE_FREQ0_CUT4096/_part.bin.log
    the option '--gp_file' is required but missing
  3. I check the run_benchmark.sh, in the gp segment, it has point the --gp_file to GP_FILE_PATH, which constituted as ${GP_PATH}_part.bin, and I found this log bin file in indices, but I don't know why it couldn't find it. It readlly confuses me, could you help to explain it? Thanks a lot!
PwzXxm commented 5 months ago

Can you uncomment this https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/run_benchmark.sh#L4 and run the script again?

Warmchay commented 5 months ago

After changing into set -x, the output is by running debug build mode:

+ popd
/home/wq/code/starling/scripts
+ mkdir -p ../indices
+ cd ../indices
+ date
Thu Jun 27 06:36:40 AM UTC 2024
+ case $2 in
+ check_dir_and_make_if_absent _M32_R48_L128_B/
+ local dir=_M32_R48_L128_B/
+ '[' -d _M32_R48_L128_B/ ']'
+ mkdir -p _M32_R48_L128_B/
+ echo 'Building disk index...'
Building disk index...
+ ../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8 --search_DRAM_budget 0
the required argument for option '--search_DRAM_budget' is missing

real    0m0.035s
user    0m0.004s
sys     0m0.009s
+ cp _M32_R48_L128_B/_disk.index _M32_R48_L128_B/_disk_beam_search.index
cp: cannot stat '_M32_R48_L128_B/_disk.index': No such file or directory

Maybe the build didn't run smoothly, that makes the _disk.index is incomplete.

Warmchay commented 5 months ago

I add the --search_DRAM_budget as $CACHE in https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/config_sample.sh#L58, but it didn't work either.

PwzXxm commented 5 months ago

After changing into set -x, the output is by running debug build mode:

+ popd
/home/wq/code/starling/scripts
+ mkdir -p ../indices
+ cd ../indices
+ date
Thu Jun 27 06:36:40 AM UTC 2024
+ case $2 in
+ check_dir_and_make_if_absent _M32_R48_L128_B/
+ local dir=_M32_R48_L128_B/
+ '[' -d _M32_R48_L128_B/ ']'
+ mkdir -p _M32_R48_L128_B/
+ echo 'Building disk index...'
Building disk index...
+ ../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8 --search_DRAM_budget 0
the required argument for option '--search_DRAM_budget' is missing

real    0m0.035s
user    0m0.004s
sys     0m0.009s
+ cp _M32_R48_L128_B/_disk.index _M32_R48_L128_B/_disk_beam_search.index
cp: cannot stat '_M32_R48_L128_B/_disk.index': No such file or directory

Maybe the build didn't run smoothly, that makes the _disk.index is incomplete.

Can you show me your config_dataset.sh, it looks like you didn't provide data type, dist function and others, as shown in

../debug/tests/build_disk_index --data_type --dist_fn --data_path --index_path_prefix _M32_R48_L128_B/ -R 48 -L 128 -B -M 32 -T 8

Please strictly follow https://github.com/zilliztech/starling/blob/1193e72c26421309f7c1bbc5d842b512ef84962d/scripts/config_dataset.sh#L13 and provide all fields.

Warmchay commented 5 months ago

My config_dataset.sh as shown:

  BASE_PATH=/data1/wq/bigann/diskann_100M/bigann_base_100M.fbin
  QUERY_FILE=/data1/wq/bigann/diskann_100M/bigann_query.fbin
  GT_FILE=/data/datasets/BIGANN/bigann_gt_100M
  PREFIX=bigann_100m
  DATA_TYPE=float
  DIST_FN=l2
  B=0.3
  K=10
  DATA_DIM=128
  DATA_N=100000000
}

Oops, I think it might have these problems:

  1. I enlarge bigann 100M byte dataset to float dataset, and I change data field as float, I'm not sure whether starling provides it or not.
  2. The .fbin files provided by DiskANN's convert_fvecs_to_fbin tool, by the way, I haven't found your testing dataset in sift website, could you instruct me how to get your dataset?

Really thanks to your help and collaborate replies!

PwzXxm commented 5 months ago

My config_dataset.sh as shown:

  BASE_PATH=/data1/wq/bigann/diskann_100M/bigann_base_100M.fbin
  QUERY_FILE=/data1/wq/bigann/diskann_100M/bigann_query.fbin
  GT_FILE=/data/datasets/BIGANN/bigann_gt_100M
  PREFIX=bigann_100m
  DATA_TYPE=float
  DIST_FN=l2
  B=0.3
  K=10
  DATA_DIM=128
  DATA_N=100000000
}

Oops, I think it might have these problems:

  1. I enlarge bigann 100M byte dataset to float dataset, and I change data field as float, I'm not sure whether starling provides it or not.
  2. The .fbin files provided by DiskANN's convert_fvecs_to_fbin tool, by the way, I haven't found your testing dataset in sift website, could you instruct me how to get your dataset?

Really thanks to your help and collaborate replies!

  1. float is supported.
  2. The datasets are from https://big-ann-benchmarks.com/neurips21.html
Warmchay commented 5 months ago

Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?

PwzXxm commented 5 months ago

Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?

Thanks! I will retry it again by using the provided dataset, but I wonder why my synthetic dataset(as described above) could run smoothly on FAISS, but couldn't detectd by starling, is there any wrong about my running scripts or just because of the dataset?

Starling uses the same build process as DiskANN, which utilise build_disk_index function, if it can be built by DiskANN then it should be able to run Starling build process.

Warmchay commented 5 months ago

Thanks your lasting help! I think I don't have any problem, but I need to check my process is followed right or not, if I have problems in another angles, I will re-open the issues again, from now on, I should examine myself firstly.

Really thanks to your help again, you explain my matters really clearly!