rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU
https://rapids.ai
Apache License 2.0
205 stars 66 forks source link

[DOC] How can I use logger for CAGRA search to find number of hops? #428

Open Lion815 opened 6 days ago

Lion815 commented 6 days ago

Report needed documentation

Report needed documentation I am trying to work on the pylibraft and cuVS package. Specifically, I want to do some research on the cagra algorithm. And I need to get some intermediate information like how many times of comparison was done during cagra search. So, I wonder if you have some embedded logger in the pylibraft package and cuVS package maybe I can take advantage of. Furthermore, I found that cagra is equipped with a update_dataset method in pylibraft, but not in cuVS, is there some same method in cuVS?

cjnolet commented 4 days ago

Hi @Lion815 thanks for opening an issue here. We use spdlog, but we compile for a specific log level (and above), so anything at a lower level would require recompiling the codebase. I believe we compile for DEBUG and above at the moment and that should be able to be enabled from the command-line with an environment property. However, I don't believe we currently do any logging specifically for the number of hops used to search the CAGRA graph. I think this can also be challenging because of the parallelism involved, so logging this would likely slow down performance (since this information is only available in CUDA kernels and they would likely need to coordinate to compute the result and being able to print it from RAM memory). This is something you could likely instrument specifcially for your case, of course, but not something that exists today.

update_dataset() is exposed in the C++ APIs, mostly for internal use. Please create a feature request and we can expose it in the Python API as well, if that would be helpful.

Lion815 commented 19 hours ago

Thank you! @cjnolet It helped a lot. Here is another question: I found in pylibraft it will work if I set metric = 'inner_product' in 'cagra.IndexParams',but in cuVS it seems to still work with the default distance definition. Additionally, when I tried to add codes like RAFT_LOG_INFO in cagra_build.cuh and use build.sh python to reinstall the package, my modification did not make a difference. I am not sure if I missed something. As I am lack of package development experience, I will really appreciate it if you can help me with it. Hope my questions are not too stupid. (QAQ)