run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.9k stars 176 forks source link

ValueError: Could not extract json string from output #66

Closed Dr-Anil-Shinde closed 1 week ago

Dr-Anil-Shinde commented 4 months ago

from llama_index.core.node_parser import MarkdownElementNodeParser node_parser = MarkdownElementNodeParser(llm=Settings.llm, num_workers=8) nodes = node_parser.get_nodes_from_documents(documents) llm = mistral

ValueError: Could not extract json string from output: { "summary": "As of December 31, 2021 and March 31, 2022, the company's assets include cash and cash equivalents, restricted cash and cash equivalents, accounts receivable, prepaid expenses, investments, equity method investments, property and equipment, operating lease right-of-use assets, and intangible assets. Liabilities include accounts payable, short-term insurance reserves, operating lease liabilities (current and non-current), long-term insurance reserves, long-term debt, and other long-term liabilities. Non-controlling interests are also reported. As of the end of the periods, total assets were $38,774 and $32,812, and total equity was $15,145 and $9,613, respectively.", "table_title": "Balance Sheet", "table_id": "id_16", "columns": [ { "col_name": "Assets", "col_type": "string", "summary": "As of December 31, 202

frankbaele commented 4 months ago

update to the latest version, this resolved the error for me. I'm guessing an API change.

jmiddleton commented 3 months ago

didn't work for me with llama-parse 0.4. Which library do we need to update?

dieharders commented 2 months ago

Same here, I'm on version 0.4.1 using a quantized Llama2-7B. Was following the tutorial for llama-parse from https://medium.com/@yu-joshua/using-llamaparse-for-knowledge-graph-creation-from-documents-3bd1e1849754 loading a quarterly report pdf from META.

*Edit: Its the MarkdownElementNodeParser that is the problem. Not sure why?

hamza233 commented 2 months ago

Facing the same issue:

ValueError: Could not extract json string from output: Please note that the table has no title, and the column names are not explicitly stated in the context.

BinaryBrain commented 1 week ago

It seems to be fixed now. I just ran this code successfully:

from llama_index.llms.mistralai import MistralAI
from llama_index.core.node_parser import MarkdownElementNodeParser
from llama_parse import LlamaParse

llm = MistralAI()
node_parser = MarkdownElementNodeParser(llm=llm, num_workers=8)
pdf_file_name = './insurance.pdf'
documents = LlamaParse(result_type="markdown").load_data(pdf_file_name)
nodes = node_parser.get_nodes_from_documents(documents)
print(nodes)