parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
688 stars 98 forks source link

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. #15

Closed LLLeoLi closed 3 months ago

LLLeoLi commented 3 months ago
catle2aurecon commented 3 months ago

Ran into the same problem, it relates to a tree_builder.build_from_text function. It would be the problem regardless of LLM model choices.

JacksonCakes commented 3 months ago

I had the same problem as well. Seems like related to https://github.com/MaartenGr/BERTopic/issues/97#issuecomment-1831494493

catle2aurecon commented 3 months ago

Here is how I avoid the above error:

2024-03-18 21:48:13,833 - Successfully initialized TreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB

        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}

2024-03-18 21:48:13,833 - Successfully initialized ClusterTreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB

        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}

2024-03-18 21:48:13,833 - Successfully initialized RetrievalAugmentation with Config 
        RetrievalAugmentationConfig:

        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB

        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}

        TreeRetrieverConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Context Embedding Model: EMB
            Embedding Model: <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>
            Num Layers: None
            Start Layer: None

            QA Model: <__main__.ROOTQAModel object at 0x7f3dbc9ec6d0>
            Tree Builder Type: cluster
parthsarthi03 commented 3 months ago

Should be fixed with #16