If I understand the idea behind DiskANN (I may be completely misunderstanding it), it performs clustering for free as a result of building an index (like HNSW). It would be an amazing feature to be able to get each vector's "cluster". This would be really useful for entity resolution / de-duplication / blocking, without having to query for every point in the database.
If I understand the idea behind DiskANN (I may be completely misunderstanding it), it performs clustering for free as a result of building an index (like HNSW). It would be an amazing feature to be able to get each vector's "cluster". This would be really useful for entity resolution / de-duplication / blocking, without having to query for every point in the database.