[BUG] No Outline Generated

Justinius commented 1 month ago

From time to time I receive an error message: "root : ERROR : No outline for {topic}. Will directly search with the topic."

I've modified one of the examples to use Ollama with the persona examples using an offline database as the vector store. I wrote an adapter to chunk other data to the qdrant vector store instead of having it read to the CSV first.

I rebased off of the github sources this morning and I'm having the same problem. Previously I thought it might have been an issue with the file path including the topic but I'm not sure that is the root cause, because I can delete the topic folder and rerun and on some instances it will complete the outline.

The jsonl files for raw search and conversation are there, but it doesn't seem to completely generate the outline text file for some reason. Any help you could provide would be greatly appreciated.

shaoyijia commented 1 month ago

I've modified one of the examples to use Ollama with the persona examples using an offline database as the vector store.

Could you provide an example for your failure case including your input and the output files you get? Based on the description, it's hard to tell whether it's because the LM you use is not following instructions successfully or because of some bug.

Yucheng-Jiang commented 1 month ago

We just updated the bug report issue template. Please follow the template to help us better assist debugging. Appreciated!

template:

Describe the bug A clear and concise description of what the bug is.

To Reproduce Report following things

Input topic name
All output files generated for this topic as a zip file.

Screenshots If applicable, add screenshots to help explain your problem.

Environment:

OS: [e.g. iOS, Windows]
Browser [e.g. chrome, safari] if the bug report is UI problem

Justinius commented 1 month ago

Describe the bug Occasionally the process does not generate an article outline and throws an error. Given the reasoning behind storm sometime this greatly reduces the quality of the generated article, in this example its not much of a problem given the toy nature of the data.

To Reproduce Attached is a zip folder containing the vector store used as well as the generated documents. I created a new template by merging the Ollama Wiki Example for the personas and VectorRM example for using a local DB so STORM would use local PDFs and a locally hosted LLM. The LLM used was llama3.1:8b. I created a separate script that just used pyPDF to pull text from 6 PDFs I downloaded from Arxiv about PSO.

Input topic name "Provide and overview of particle swarm optimization and possible improvements" Had typo put "and" instead of "an" in the topic but LLM understood the request anyway.

Output files generated Includes Vector DB psoTest.zip

Screenshots If applicable, add screenshots to help explain your problem.

Environment Windows 10 - 2009.19045 RAM: 32GB CPU: 6 Core 3.6GHz Intel Xeon W-2133 (HT) GPU: P4000

Cloned GITHUB repo yesterday so should be pretty up to date on the source

Justinius commented 1 month ago

I should note - I replaced torch with a version compiled for my architecture and to use CUDA. I also increased the timeout in ollama given that even with my GPU it was taking awhile.

Command Prompt Input: C:\pythonEnv\storm>python -m stormLocal --output-dir "C:\pythonEnv\storm\psoTest" --vector-db-mode "offline" --offline-vector-db-dir "c:\pythonEnv\storm\psoTest" --do-research --do-generate-outline --do-generate-article --do-polish-article

stormLocal is my version of the examples where I basically merged the Ollama and VectorRM examples.

Yucheng-Jiang commented 4 weeks ago

Thanks for reporting! I looked into llm_call_history.jsonl and find the outline generated does not align with specified format, thus no valid outline is displayed. Some pointers

You can search for completion id chatcmpl-da39a3ee5e6b4b0d3255bfef95601890afd80709 in llm_call_history.jsonl and find the generated outline starts with integer followed by a dot
In both draft outline and refine outline generation we call the function clean_up_outline(), which strictly requires lines begin with arbitrary number of # as valid outline section name.

Suggestion to solve this issue

[Recommended] Modify the prompt here and here to make your LM output align with the desired format
OR, modify the function clean_up_outline() to match with your LM generated format

Hope it helps.

Yucheng-Jiang commented 2 weeks ago

[Reminder] Will close this issue if have no further questions by end of this week.

stanford-oval / storm

[BUG] No Outline Generated #139