parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
961 stars 134 forks source link

raptor/cluster_utils.py line 111 AttributeError: 'bool' object has no attribute 'all' #56

Open bogolese opened 2 months ago

bogolese commented 2 months ago

These lines:

        indices = np.where(
            (embeddings == local_cluster_embeddings_[:, None]).all(-1)
        )[1]

are producing this error:

AttributeError: 'bool' object has no attribute 'all'

Background:

both embeddings and local_clusterembeddings are of type numpy.ndarray embeddings.shape: (507362, 768) local_cluster_embeddings.shape: (749, 768) local_cluster_embeddings[:, None].shape: (749, 1, 768)

so an apparent shape mis-match?

embeddings == local_clusterembeddings[:, None]: False (embeddings == local_clusterembeddings[:, None]): False

so no all(), hence

(embeddings == local_clusterembeddings[:, None]).all(-1): ERROR!

If I knew what this was supposed to do (i.e., get the indices of the nodes in the local cluster maybe?) I could maybe work around this. But right now I'm flubbered as to what is going wrong and what it's supposed to do!

bogolese commented 1 month ago

I think this may have something to do with the size of the dataset. I ran through a subset of the data (50K vs 500K entries) and this incomprehensible (me me anyway!) line of code works as advertised. Apparently it is capturing the indices of the entries within embeddings that match the entries in local_clusterembeddings. I have NO idea WHY this works (the two arrays have a different shape!), but it seems to do so. Oh, and a comment would have been nice. :)