stanford-oval / storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
http://storm.genie.stanford.edu
MIT License
10.15k stars 963 forks source link

[BUG] Storm with claude sonnet did not use up the maximum token (8192) in its output. #154

Open Vanessa-Taing opened 2 weeks ago

Vanessa-Taing commented 2 weeks ago

Describe the bug

Tried to generate a 5000 words article with claude haiku and claude sonnet. Settings for the token:

    conv_simulator_lm = ClaudeModel(model='claude-3-haiku-20240307', max_tokens=4096, **claude_kwargs)
    question_asker_lm = ClaudeModel(model='claude-3-haiku-20240307', max_tokens=4096, **claude_kwargs)
    outline_gen_lm = ClaudeModel(model='claude-3-haiku-20240307', max_tokens=4096, **claude_kwargs)
    article_gen_lm = ClaudeModel(model='claude-3-5-sonnet-20240620', max_tokens=8192, **claude_kwargs)
    article_polish_lm = ClaudeModel(model='claude-3-5-sonnet-20240620', max_tokens=8192, **claude_kwargs)

Full log:

Topic: History of European countries. 5000-word article covering all history in the European countries.
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
root : ERROR    : Error occurs when searching query : 'hits'
knowledge_storm.interface : INFO     : run_knowledge_curation_module executed in 155.8709 seconds
knowledge_storm.interface : INFO     : run_outline_generation_module executed in 7.7600 seconds
sentence_transformers.SentenceTransformer : INFO     : Use pytorch device_name: cpu
sentence_transformers.SentenceTransformer : INFO     : Load pretrained SentenceTransformer: paraphrase-MiniLM-L6-v2
knowledge_storm.interface : INFO     : run_article_generation_module executed in 31.8236 seconds
knowledge_storm.interface : INFO     : run_article_polishing_module executed in 7.7724 seconds
***** Execution time *****
run_knowledge_curation_module: 155.8709 seconds
run_outline_generation_module: 7.7600 seconds
run_article_generation_module: 31.8236 seconds
run_article_polishing_module: 7.7724 seconds
***** Token usage of language models: *****
run_knowledge_curation_module
    claude-3-haiku-20240307: {'prompt_tokens': 108660, 'completion_tokens': 24109}
    claude-3-5-sonnet-20240620: {'prompt_tokens': 0, 'completion_tokens': 0}
run_outline_generation_module
    claude-3-haiku-20240307: {'prompt_tokens': 7249, 'completion_tokens': 824}
    claude-3-5-sonnet-20240620: {'prompt_tokens': 0, 'completion_tokens': 0}
run_article_generation_module
    claude-3-haiku-20240307: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-5-sonnet-20240620: {'prompt_tokens': 2312, 'completion_tokens': 1086}
run_article_polishing_module
    claude-3-haiku-20240307: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-5-sonnet-20240620: {'prompt_tokens': 1297, 'completion_tokens': 299}

The generated outline was detailed and lengthy, but the article itself did not include all the outline, and the length is only 650-750 words.

Increasing the hyperparameters did not increase the length.

# hyperparameters for the pre-writing stage
    parser.add_argument('--max-conv-turn', type=int, default=6,
                        help='Maximum number of questions in conversational question asking.')
    parser.add_argument('--max-perspective', type=int, default=6,
                        help='Maximum number of perspectives to consider in perspective-guided question asking.')
    parser.add_argument('--search-top-k', type=int, default=6,
                        help='Top k search results to consider for each search query.')
    # hyperparameters for the writing stage
    parser.add_argument('--retrieve-top-k', type=int, default=7,
                        help='Top k collected references for each section title.')
    parser.add_argument('--remove-duplicate', action='store_true',
                        help='If True, remove duplicate content from the article.')

To Reproduce Report following things

  1. Input topic name: History of European countries. 5000-word article covering all history in the European countries.
  2. All output files generated for this topic as a zip file. (Output file attached as zip)

Screenshots If applicable, add screenshots to help explain your problem.

Environment:

Yucheng-Jiang commented 2 weeks ago

Thanks for providing detailed information! There are several factors that affect the generated article length:

We plan to have an upgrade in the coming week, which may mitigate this issue. Exact date TBD.