stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.67k stars 355 forks source link

How to check the centroids and the data in the clusters? #338

Open ravirajag opened 2 months ago

ravirajag commented 2 months ago

I have indexed around 11k sentences and it created some 4000 centroids. I am able to load the centroids file using the code

from colbert.indexing.codecs.residual import ResidualCodec
res_codec = ResidualCodec.load(index_path)

I want to see what these 4000 centroids are (sentences). How should I get that? I want to see what data goes under each cluster here.