Closed LLLeoLi closed 3 months ago
Ran into the same problem, it relates to a tree_builder.build_from_text
function.
It would be the problem regardless of LLM model choices.
I had the same problem as well. Seems like related to https://github.com/MaartenGr/BERTopic/issues/97#issuecomment-1831494493
Here is how I avoid the above error:
2024-03-18 21:48:13,833 - Successfully initialized TreeBuilder with Config
TreeBuilderConfig:
Tokenizer: <Encoding 'cl100k_base'>
Max Tokens: 100
Num Layers: 2
Threshold: 0.5
Top K: 5
Selection Mode: top_k
Summarization Length: 100
Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
Cluster Embedding Model: EMB
Reduction Dimension: 5
Clustering Algorithm: RAPTOR_Clustering
Clustering Parameters: {}
2024-03-18 21:48:13,833 - Successfully initialized ClusterTreeBuilder with Config
TreeBuilderConfig:
Tokenizer: <Encoding 'cl100k_base'>
Max Tokens: 100
Num Layers: 2
Threshold: 0.5
Top K: 5
Selection Mode: top_k
Summarization Length: 100
Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
Cluster Embedding Model: EMB
Reduction Dimension: 5
Clustering Algorithm: RAPTOR_Clustering
Clustering Parameters: {}
2024-03-18 21:48:13,833 - Successfully initialized RetrievalAugmentation with Config
RetrievalAugmentationConfig:
TreeBuilderConfig:
Tokenizer: <Encoding 'cl100k_base'>
Max Tokens: 100
Num Layers: 2
Threshold: 0.5
Top K: 5
Selection Mode: top_k
Summarization Length: 100
Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
Cluster Embedding Model: EMB
Reduction Dimension: 5
Clustering Algorithm: RAPTOR_Clustering
Clustering Parameters: {}
TreeRetrieverConfig:
Tokenizer: <Encoding 'cl100k_base'>
Threshold: 0.5
Top K: 5
Selection Mode: top_k
Context Embedding Model: EMB
Embedding Model: <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>
Num Layers: None
Start Layer: None
QA Model: <__main__.ROOTQAModel object at 0x7f3dbc9ec6d0>
Tree Builder Type: cluster
Should be fixed with #16