max_tokens is too small to fit a single line of text. Breaking this line

xianzhisheng commented 1 year ago

I'm trying to run the Example 5 from here

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    retrieve_config={
        "task": "qa",
        "docs_path": "./docs",
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        "client": chromadb.PersistentClient(path="./chromadb"),
        "collection_name": "natural-questions",
        "chunk_mode": "one_line",
        "embedding_model": "all-MiniLM-L6-v2",
    },
)

qa_problem = "what is non controlling interest on balance sheet"
ragproxyagent.initiate_chat(assistant, problem=qa_problem)

I put the corpus.txt in the ./docs,and i got the error below:

(autogen_env) PS C:\Users\user\Documents\autogen-main\autogen_env> python .\test_autogen_RAG.py
Trying to create collection.
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Film </Th> <Th> Year </Th> <Th> Fuck count </Th> <Th> Minutes </Th> <Th> Uses / mi ...
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Character </Th> <Th> Ultimate Avengers </Th> <Th> Ultimate Avengers 2 </Th> <Th> I ...
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Position </Th> <Th> Country </Th> <Th> Town / City </Th> <Th> PM2. 5 </Th> <Th> PM ...
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Rank </Th> <Th> Country ( or dependent territory ) </Th> <Th> Population </Th> <Th ...
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Rank </Th> <Th> State </Th> <Th> Gross collections ( in thousands ) </Th> <Th> Rev ...
max_tokens is too small to fit a single line of text. Breaking this line:
        <Table> <Tr> <Th> Date </Th> <Th> Province </Th> <Th> Mag . </Th> <Th> MMI </Th> <Th> Deaths </Th> < ...
max_tokens is too small to fit a single line of text. Breaking this line:

xianzhisheng commented 1 year ago

Sometimes it may appear with other messages:

Trying to create collection.
max_tokens is too small to fit a single line of text. Breaking this line:
        #第一编　总则 ...
Failed to split docs with must_break_at_empty_line being True, set to False.
max_tokens is too small to fit a single line of text. Breaking this line:
        #第一编　总则 ...
Failed to split docs with must_break_at_empty_line being True, set to False.

thinkall commented 1 year ago

Hi @xianzhisheng , they're not erros but warnings. The collections were created but some settings are not the same as those you've setted.

lizijian-buaa commented 2 months ago

@thinkall Hi, is there a way to stop printing these messages? Thx a lot!

thinkall commented 2 months ago

@thinkall Hi, is there a way to stop printing these messages? Thx a lot!

@lizijian-buaa , try adding below code in your python script or notebook:

import logging

logging.getLogger().setLevel(logging.CRITICAL)

microsoft / autogen

max_tokens is too small to fit a single line of text. Breaking this line #256