Open zlwu92 opened 1 year ago
@zlwu92 you could try using:
index = cagra.build(cagra.IndexParams(build_algo="nn_descent"), self.torch_data, handl=hamlde)
This should help speedup your build times by internally using the NN Descent algorithm instead of IVF PQ to build the CAGRA Index.
IVF-PQ would also benefit from using a pooled workspace memory resource (as indicated in the suggestions in the log). @cjnolet do we have a python API/example for setting the workspace to the default pooled memory resource?
@tarang-jain that warning will go away once it's using nn-descent.
@zlwu92, as @tarang-jain points out, a pool memory resource can often improve end-to-end performance by allowing asynchronous temporary memory allocations within the algorithms.
Here's an example of how to enable this in Python.
So can we use python api to get the memory consumption stats for both IVF-PQ method and nn_descent method?
As for interating enabling pool memory space in my current code, is it like this?
pool = rmm.mr.PoolMemoryResource(
... rmm.mr.CudaMemoryResource(),
... initial_pool_size=2**30,
... maximum_pool_size=2**32
... )
rmm.mr.set_current_device_resource(pool)
index = cagra.build(cagra.IndexParams(build_algo="nn_descent"), self.torch_data, handle=handle)
handle.sync()
self.distances, self.neighbors = cagra.search(cagra.SearchParams(), index, self.torch_query, self.topk, handle=handle)
Also, I found datasets from anns-benchmarks , there are some datasets, like nytimes, that using distance metric non L2, like angular or jaccard. Did cagra api supports these metrics? It seems that currently it only supports l2 from the api document?
If I want to test these datasets, how should I use this api?
What is your question? Hi,
I'm new to use raft anns python api, cagra.
I follow the examples shown https://docs.rapids.ai/api/raft/nightly/pylibraft_api/neighbors/#cagra.
I found warnings in
Could you please give me some code to teach me how to make it faster, like how to set my own resource?
This is my code. Am I using correct? I test sift1m and gist1m datasets.