[Question]: when break complex problem into sub-problems raise “JSONDecodeError"

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

When I followed the official documentation guide "Llama3 Cookbook with Ollama", I encountered a problem in the example "Break Complex Questions down into Sub-Questions". When I tried to run the code according to the tutorial in the example, the compiler prompted an error in JSON parsing, which did not occur in the previous tutorials.

code show as below:

drake_index = VectorStoreIndex.from_documents(drake_docs)
drake_query_engine = drake_index.as_query_engine(similarity_top_k=3)

kendrick_index = VectorStoreIndex.from_documents(kendrick_docs)
kendrick_query_engine = kendrick_index.as_query_engine(similarity_top_k=3)

drake_tool = QueryEngineTool(
    drake_index.as_query_engine(),
    metadata=ToolMetadata(
        name="drake_search",
        description="Useful for searching over Drake's life.",
    ),
)

kendrick_tool = QueryEngineTool(
    kendrick_index.as_query_engine(),
    metadata=ToolMetadata(
        name="kendrick_search",
        description="Useful for searching over Kendrick's life.",
    ),
)

query_engine = SubQuestionQueryEngine.from_defaults(
    [drake_tool, kendrick_tool],
    llm=llm,  # llama3-8b,Ollama
    verbose=True,
)

response = query_engine.query("Which albums did Drake release in his career?")

print(response)

The compiler reports the following error:

Traceback (most recent call last):
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\output_parsers\utils.py", line 45, in parse_json_markdown
    json_obj = json.loads(json_string)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 21 column 1 (char 500)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\output_parsers\utils.py", line 52, in parse_json_markdown
    json_obj = yaml.safe_load(json_string)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\composer.py", line 58, in compose_document
    self.get_event()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\parser.py", line 118, in get_event
    self.current_event = self.state()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\parser.py", line 193, in parse_document_end
    token = self.peek_token()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\scanner.py", line 129, in peek_token
    self.fetch_more_tokens()
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\yaml\scanner.py", line 258, in fetch_more_tokens
    raise ScannerError("while scanning for the next token", None,
yaml.scanner.ScannerError: while scanning for the next token
found character '`' that cannot start any token
  in "<unicode string>", line 21, column 1:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\CodeProject\langchain1\pythonProject1\ollama-langchain-example.py", line 191, in response = query_engine.query("Which albums did Drake release in his career?") File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 198, in wrapper result = func(*args, **kwargs) File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query query_result = self._query(str_or_query_bundle) File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\query_engine\sub_question_query_engine.py", line 145, in _query sub_questions = self._question_gen.generate(self._metadatas, query_bundle) File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\llm_generators.py", line 81, in generate parse = self._prompt.output_parser.parse(prediction) File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\output_parser.py", line 11, in parse json_dict = parse_json_markdown(output) File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\output_parsers\utils.py", line 54, in parse_json_markdown raise OutputParserException( llama_index.core.output_parsers.base.OutputParserException: Got invalid JSON object. Error: Extra data: line 21 column 1 (char 500) while scanning for the next token found character '`' that cannot start any token in "", line 21, column 1:

    ^. Got JSON string: {
    "items": [
        {
            "sub_question": "What is the revenue growth of Uber",
            "tool_name": "uber_10k"
        },
        {
            "sub_question": "What is the EBITDA of Uber",
            "tool_name": "uber_10k"
        },
        {
            "sub_question": "What is the revenue growth of Lyft",
            "tool_name": "lyft_10k"
        },
        {
            "sub_question": "What is the EBITDA of Lyft",
            "tool_name": "lyft_10k"
        }
    ]
}

# Example 2

{
    "items": [
        {
            "sub_question": "What are Drake's albums?",
            "tool_name": "drake_search"
        }
    ]
}

Process finished with exit code 1

In the error message, there is an error message about JSON parsing. What could be the reason? . What also puzzles me is that the error message contains information about previously used tools (such as "uber_10k", "lyft_10k", etc.), which seem to be loaded, but I did not call them. Is there any solution? Thank you very much!

Hey @xiaohuozilalala! 👋 I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you while you wait for a human maintainer.

The JSON parsing error you're encountering is likely due to the presence of an 'items' key in the JSON output, which can break the parsing from open-source LLMs such as Zephyr. To resolve this issue, you need to ensure that the JSON output does not contain the 'items' key or handle it appropriately in your parsing logic.

Here's how you can modify the SubQuestionOutputParser class to handle this:

class SubQuestionOutputParser(BaseOutputParser):
    def parse(self, output: str) -> Any:
        json_dict = parse_json_markdown(output)
        if not json_dict:
            raise ValueError(f"No valid JSON found in output: {output}")

        # Handle the 'items' key if present
        if "items" in json_dict:
            json_dict = json_dict["items"]

        sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
        return StructuredOutput(raw_output=output, parsed_output=sub_questions)

    def format(self, prompt_template: str) -> str:
        return prompt_template

This modification ensures that if the 'items' key is present in the JSON output, it will be handled correctly, preventing the JSON parsing error.

Additionally, ensure that your JSON output is structured correctly and does not include any extraneous data that could cause parsing issues. If the problem persists, you may need to investigate why the LLM predictor is returning invalid JSON and address it at the source.

For a more robust solution, consider using the PydanticOutputParser class from the LlamaIndex framework, which provides a structured way to handle JSON parsing and formatting:

class PydanticOutputParser(ChainableOutputParser):
    def __init__(self, output_cls: Type[Model], excluded_schema_keys_from_format: Optional[List] = None, pydantic_format_tmpl: str = PYDANTIC_FORMAT_TMPL) -> None:
        self._output_cls = output_cls
        self._excluded_schema_keys_from_format = excluded_schema_keys_from_format or []
        self._pydantic_format_tmpl = pydantic_format_tmpl

    @property
    def output_cls(self) -> Type[Model]:
        return self._output_cls

    @property
    def format_string(self) -> str:
        return self.get_format_string(escape_json=True)

    def get_format_string(self, escape_json: bool = True) -> str:
        schema_dict = self._output_cls.schema()
        for key in self._excluded_schema_keys_from_format:
            del schema_dict[key]

        schema_str = json.dumps(schema_dict)
        output_str = self._pydantic_format_tmpl.format(schema=schema_str)
        if escape_json:
            return output_str.replace("{", "{{").replace("}", "}}")
        else:
            return output_str

    def parse(self, text: str) -> Any:
        json_str = extract_json_str(text)
        return self._output_cls.parse_raw(json_str)

    def format(self, query: str) -> str:
        return query + "\n\n" + self.get_format_string(escape_json=True)

This class ensures that the JSON output adheres to a specified schema, making it easier to parse and validate the output ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

@dosu Thank you for the ideas you provided. But after checking and running the code, I found that the JSON parsing error seems to be a superficial reason, not the most fundamental reason. The "query_engine()" method seems to automatically cache my previous questions and keep them in memory. As a result, in the current task, even if I did not mention the previous questions, they were added to the current questions by default, causing errors. Because the compiler's error repeatedly mentioned information related to previous Uber and Lyft financial reports, and the document was not used in the current task. Is this a memory leak, or is it caused by automatic caching in query_engine()? Besides that, this JSON error problem seems to be related to the decomposition of sub-problems. Could it be that "query_engine()" fails to decompose the sub-problems correctly, causing the error?

Here are the new errors the compiler reports (which keep changing even if the code is not modified): error1

Traceback (most recent call last):
  File "E:\CodeProject\langchain1\pythonProject1\ollama-langchain-example.py", line 190, in <module>
    response = query_engine.query("Which albums did Drake release in his career?")
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 198, in wrapper
    result = func(*args, **kwargs)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query
    query_result = self._query(str_or_query_bundle)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\query_engine\sub_question_query_engine.py", line 145, in _query
    sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\llm_generators.py", line 81, in generate
    parse = self._prompt.output_parser.parse(prediction)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\output_parser.py", line 13, in parse
    raise ValueError(f"No valid JSON found in output: {output}")
ValueError: No valid JSON found in output: I'd be happy to help! Given a user question and a list of tools, I can output a list of relevant sub-questions in JSON markdown that when composed can help answer the full user question.

Please provide the user question and the list of tools, and I'll generate the output. For example 1, I understand that you want me to compare and contrast revenue growth and EBITDA of Uber and Lyft for year 2021.

error2

Traceback (most recent call last):
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\pydantic\v1\main.py", line 522, in parse_obj
    obj = dict(obj)
ValueError: dictionary update sequence element #0 has length 1; 2 is required

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\CodeProject\langchain1\pythonProject1\ollama-langchain-example.py", line 190, in <module>
    response = query_engine.query("Which albums did Drake release in his career?")
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 198, in wrapper
    result = func(*args, **kwargs)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\base\base_query_engine.py", line 51, in query
    query_result = self._query(str_or_query_bundle)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\query_engine\sub_question_query_engine.py", line 145, in _query
    sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\llm_generators.py", line 81, in generate
    parse = self._prompt.output_parser.parse(prediction)
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\output_parser.py", line 21, in parse
    sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\llama_index\core\question_gen\output_parser.py", line 21, in <listcomp>
    sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
  File "E:\CodeProject\conda_file\envs\langchain1\lib\site-packages\pydantic\v1\main.py", line 525, in parse_obj
    raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
pydantic.v1.error_wrappers.ValidationError: 1 validation error for SubQuestion
__root__
  SubQuestion expected dict not str (type=type_error)

The error message seems to be different each time I run it, which makes me very confused. How can I further investigate?Thanks!

The issues you're encountering seem to be multifaceted, involving both JSON parsing errors and potential issues with the SubQuestionQueryEngine's handling of sub-questions. Here are some steps to address these problems:

1. Handling JSON Parsing Errors

Ensure that the JSON output is correctly formatted and does not include extraneous data. The SubQuestionOutputParser class should handle the 'items' key appropriately:

class SubQuestionOutputParser(BaseOutputParser):
    def parse(self, output: str) -> Any:
        json_dict = parse_json_markdown(output)
        if not json_dict:
            raise ValueError(f"No valid JSON found in output: {output}")

        if "items" in json_dict:
            json_dict = json_dict["items"]

        sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
        return StructuredOutput(raw_output=output, parsed_output=sub_questions)

    def format(self, prompt_template: str) -> str:
        return prompt_template

2. Investigating Caching or Memory Issues

The SubQuestionQueryEngine should not cache previous questions by default. However, if you suspect that previous questions are being cached, you can try resetting the state of the query engine or creating a new instance for each query to ensure a clean state.

3. Ensuring Correct Sub-Question Decomposition

The SubQuestionQueryEngine relies on the BaseQuestionGenerator to generate sub-questions. Ensure that the question generator is correctly configured and that the prompt used for generating sub-questions is appropriate. Here is an example of how the SubQuestionQueryEngine interacts with the QueryEngineTool:

class SubQuestionQueryEngine(BaseQueryEngine):
    def __init__(self, question_gen, response_synthesizer, query_engine_tools, callback_manager=None, verbose=True, use_async=False):
        self._question_gen = question_gen
        self._response_synthesizer = response_synthesizer
        self._metadatas = [x.metadata for x in query_engine_tools]
        self._query_engines = {tool.metadata.name: tool.query_engine for tool in query_engine_tools}
        self._verbose = verbose
        self._use_async = use_async
        super().__init__(callback_manager)

    def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
        sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
        if self._use_async:
            tasks = [self._aquery_subq(sub_q) for sub_q in sub_questions]
            qa_pairs_all = run_async_tasks(tasks)
        else:
            qa_pairs_all = [self._query_subq(sub_q) for sub_q in sub_questions]
        qa_pairs = list(filter(None, qa_pairs_all))
        nodes = [self._construct_node(pair) for pair in qa_pairs]
        source_nodes = [node for qa_pair in qa_pairs for node in qa_pair.sources]
        response = self._response_synthesizer.synthesize(query=query_bundle, nodes=nodes, additional_source_nodes=source_nodes)
        return response

    async def _aquery(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
        sub_questions = await self._question_gen.agenerate(self._metadatas, query_bundle)
        tasks = [self._aquery_subq(sub_q) for sub_q in sub_questions]
        qa_pairs_all = await asyncio.gather(*tasks)
        qa_pairs = list(filter(None, qa_pairs_all))
        nodes = [self._construct_node(pair) for pair in qa_pairs]
        source_nodes = [node for qa_pair in qa_pairs for node in qa_pair.sources]
        response = await self._response_synthesizer.asynthesize(query=query_bundle, nodes=nodes, additional_source_nodes=source_nodes)
        return response

4. Debugging and Logging

Add logging to track the flow of data and identify where the errors occur. For example, log the output of the LLM predictor before parsing it:

def generate(self, tools, query):
    tools_str = self._tools_to_str(tools)
    query_str = self._query_to_str(query)

    prediction = self._llm_predictor.predict(
        prompt=self._prompt,
        tools_str=tools_str,
        query_str=query_str,
    )

    logger.debug(f"LLM predictor output: {prediction}")

    try:
        parse = self._prompt.output_parser.parse(prediction)
    except JSONDecodeError:
        logger.warning(f"LLM predictor output is not a valid JSON: {prediction}")
        return None

    parse = cast(StructuredOutput, parse)
    return parse.parsed_output

5. Using a Structured Output Parser

Consider using the PydanticOutputParser for more robust JSON parsing and validation:

class PydanticOutputParser(ChainableOutputParser):
    def __init__(self, output_cls: Type[Model], excluded_schema_keys_from_format: Optional[List] = None, pydantic_format_tmpl: str = PYDANTIC_FORMAT_TMPL) -> None:
        self._output_cls = output_cls
        self._excluded_schema_keys_from_format = excluded_schema_keys_from_format or []
        self._pydantic_format_tmpl = pydantic_format_tmpl

    @property
    def output_cls(self) -> Type[Model]:
        return self._output_cls

    @property
    def format_string(self) -> str:
        return self.get_format_string(escape_json=True)

    def get_format_string(self, escape_json: bool = True) -> str:
        schema_dict = self._output_cls.schema()
        for key in self._excluded_schema_keys_from_format:
            del schema_dict[key]

        schema_str = json.dumps(schema_dict)
        output_str = self._pydantic_format_tmpl.format(schema=schema_str)
        if escape_json:
            return output_str.replace("{", "{{").replace("}", "}}")
        else:
            return output_str

    def parse(self, text: str) -> Any:
        json_str = extract_json_str(text)
        return self._output_cls.parse_raw(json_str)

    def format(self, query: str) -> str:
        return query + "\n\n" + self.get_format_string(escape_json=True)

By following these steps, you should be able to address the JSON parsing errors and ensure that the SubQuestionQueryEngine correctly handles sub-questions without caching previous queries ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

@dosu Thank you for your reply. After rechecking, I found that this problem may be similar to #7582 and #8701. This seems to be related to the llm engine used: when using "gpt-3.5-turbo", the code can work properly and give reasonable answers. Then I tested different llm models in Ollama and found that when using "phi3-14b" and "qwen2-7b", the code can run, but the responses given are not reasonable; and when using "llama3-8b", the code directly reports an error and cannot run. So, if I want to use an open source model with smaller parameters such as "llama3-8b" and combine it with "query_engine" to decompose sub-problems in RAG, how can I avoid this problem?

To use an open-source model with smaller parameters such as "llama3-8b" and combine it with query_engine to decompose sub-problems in RAG without encountering JSON parsing errors and other issues, you can follow these steps:

1. Modify the Code to Use "llama3-8b"

Ensure that you are correctly initializing and using the "llama3-8b" model. Here is an example:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.llama3 import Llama3Embedding  # Assuming Llama3Embedding is the correct import
from llama_index.llms.llama3 import Llama3  # Assuming Llama3 is the correct import

# Create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
    ]
)

# Build the index
loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()
nodes = pipeline.run(documents=documents, show_progress=True)

# Models
llm = Llama3(model_name="llama3-8b")  # Use the Llama3 model with 8 billion parameters
embed_model = Llama3Embedding(model_name="llama3-8b")  # Use the corresponding embedding model

# Build RAG
index = VectorStoreIndex(nodes=nodes, embed_model=embed_model)
query_engine = index.as_query_engine(llm=llm)

2. Use `MultiStepQueryEngine` for Query and Retrieval

Utilize the MultiStepQueryEngine class to handle the query and retrieval process. Here is an example setup:

from llama_index.core.query_engine import MultiStepQueryEngine
from llama_index.core.query_pipeline.query import QueryPipeline
from llama_index.core.query_pipeline.components.function import FnComponent
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.core.indices.service_context import ServiceContext
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core.indices.query.query_transform.base import StepDecomposeQueryTransform

# Assuming you have a retriever instance
retriever = VectorStoreIndex.from_vector_store(
    vector_store, service_context=ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
).as_retriever(similarity_top_k=8)

# Define the query pipeline
query_pipeline = QueryPipeline(verbose=True)
query_pipeline.add_modules(
    {
        "query": FnComponent(fn=query_input, output_key="output", req_params={"query_str"}),
        "retriever": retriever,
        "summarizer": CompactAndRefine(service_context=ServiceContext.from_defaults(llm=llm, embed_model=embed_model), streaming=True, verbose=True),
    }
)
query_pipeline.add_link("query", "retriever")
query_pipeline.add_link("retriever", "summarizer", dest_key="nodes")
query_pipeline.add_link("query", "summarizer", dest_key="query_str")

# Initialize the MultiStepQueryEngine with the Llama3 model
query_engine = MultiStepQueryEngine(
    query_engine=BaseQueryEngine(),  # Replace with your base query engine
    query_transform=StepDecomposeQueryTransform(),
    response_synthesizer=CompactAndRefine(service_context=ServiceContext.from_defaults(llm=llm, embed_model=embed_model)),
    num_steps=3,
    early_stopping=True
)

# Use the query engine to run a custom query
response = query_engine.query("Your query here")
print(response)

3. Handle JSON Parsing Errors

Ensure that the JSON output is correctly formatted and does not include extraneous data. Modify the SubQuestionOutputParser class to handle the 'items' key appropriately:

class SubQuestionOutputParser(BaseOutputParser):
    def parse(self, output: str) -> Any:
        json_dict = parse_json_markdown(output)
        if not json_dict:
            raise ValueError(f"No valid JSON found in output: {output}")

        if "items" in json_dict:
            json_dict = json_dict["items"]

        sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
        return StructuredOutput(raw_output=output, parsed_output=sub_questions)

    def format(self, prompt_template: str) -> str:
        return prompt_template

4. Debugging and Logging

Add logging to track the flow of data and identify where the errors occur. For example, log the output of the LLM predictor before parsing it:

def generate(self, tools, query):
    tools_str = self._tools_to_str(tools)
    query_str = self._query_to_str(query)

    prediction = self._llm_predictor.predict(
        prompt=self._prompt,
        tools_str=tools_str,
        query_str=query_str,
    )

    logger.debug(f"LLM predictor output: {prediction}")

    try:
        parse = self._prompt.output_parser.parse(prediction)
    except JSONDecodeError:
        logger.warning(f"LLM predictor output is not a valid JSON: {prediction}")
        return None

    parse = cast(StructuredOutput, parse)
    return parse.parsed_output

By following these steps, you should be able to use the "llama3-8b" model with query_engine to decompose sub-problems in RAG without encountering JSON parsing errors and other issues ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index