run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.29k stars 4.96k forks source link

[Question]: Fluctuating and Corrupted Response #15084

Open HariNuve opened 1 month ago

HariNuve commented 1 month ago

Question Validation

Question

I have throughly checked the working code multiple times,sometimes answer is properly generating ,But sometimes getting following kind of response:

llama_index.core.output_parsers.base.OutputParserException: Got invalid JSON object. Error: Expecting ',' delimiter: line 26 column 10 (char 885) while parsing a flow sequence in "", line 2, column 14: "items": [ ^ expected ',' or ']', but got ''in "", line 26, column 10: } ^. Got JSON string: { "items": [ { "sub_question": "What is the definition of folliculitis?", "tool_name": "pdf1" }, { "subpor": "Which patients are more likely to develop gram-negative folliculitis?", "tool_name": "pdf1" }, { "sub_question": "What are the common symptoms of folliculitis?", "tool_name": "pdf1" }, { "sub_question": "What are the common causes of folliculitis?", "tool_name": "pdf1" }, { "sub_question": "What are the recommended treatments for folliculitis?", "tool_name": "pdf1" }, { "sub_question": "What are the potential complications of folliculitis?", "tool_name": "pdf1" }

dosubot[bot] commented 1 month ago

Hey there, @HariNuve! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for a human maintainer.

To resolve the issue with the JSON object and ensure consistent and correct JSON responses, you can modify the parse method in the SelectionOutputParser class to handle the OutputParserException more effectively. The error indicates that the JSON parser is encountering an unexpected end of the stream, which suggests that the JSON string might be incomplete or malformed.

Here is the relevant part of the SelectionOutputParser class with a focus on the parse method:

class SelectionOutputParser(BaseOutputParser):
    REQUIRED_KEYS = frozenset(Answer.__annotations__)

    def _filter_dict(self, json_dict: dict) -> dict:
        """Filter recursively until a dictionary matches all REQUIRED_KEYS."""
        output_dict = json_dict
        for key, val in json_dict.items():
            if key in self.REQUIRED_KEYS:
                continue
            elif isinstance(val, dict):
                output_dict = self._filter_dict(val)
            elif isinstance(val, list):
                for item in val:
                    if isinstance(item, dict):
                        output_dict = self._filter_dict(item)

        return output_dict

    def _format_output(self, output: List[dict]) -> List[dict]:
        output_json = []
        for json_dict in output:
            valid = True
            for key in self.REQUIRED_KEYS:
                if key not in json_dict:
                    valid = False
                    break

            if not valid:
                json_dict = self._filter_dict(json_dict)

            output_json.append(json_dict)

        return output_json

    def parse(self, output: str) -> Any:
        json_string = _marshal_llm_to_json(output)
        try:
            json_obj = json.loads(json_string)
        except json.JSONDecodeError as e_json:
            try:
                import yaml

                # NOTE: parsing again with pyyaml
                #       pyyaml is less strict, and allows for trailing commas
                #       right now we rely on this since guidance program generates
                #       trailing commas
                json_obj = yaml.safe_load(json_string)
            except yaml.YAMLError as e_yaml:
                raise OutputParserException(
                    f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
                    f"Got JSON string: {json_string}"
                )
            except NameError as exc:
                raise ImportError("Please pip install PyYAML.") from exc

        if isinstance(json_obj, dict):
            json_obj = [json_obj]

        if not json_obj:
            raise ValueError(f"Failed to convert output to JSON: {output!r}")

        json_output = self._format_output(json_obj)
        answers = [Answer.from_dict(json_dict) for json_dict in json_output]
        return StructuredOutput(raw_output=output, parsed_output=answers)

    def format(self, prompt_template: str) -> str:
        return prompt_template + "\n\n" + _escape_curly_braces(FORMAT_STR)

To address the issue, ensure that the JSON string is complete and correctly formatted before attempting to parse it. You can add additional checks or preprocessing steps to validate the JSON string. Additionally, using pyyaml as a fallback parser can help handle cases where the JSON string has trailing commas or other minor issues.

Given the new context, it appears that the JSON responses might be part of a streaming process, where multiple JSON objects are concatenated together. In such cases, you should split the stream into individual JSON objects before parsing. Here is an example of how you can handle this:

def parse(self, output: str) -> Any:
    json_strings = output.split('\0')  # Split the stream into individual JSON objects
    json_objects = []

    for json_string in json_strings:
        if not json_string.strip():
            continue
        try:
            json_obj = json.loads(json_string)
        except json.JSONDecodeError as e_json:
            try:
                import yaml
                json_obj = yaml.safe_load(json_string)
            except yaml.YAMLError as e_yaml:
                raise OutputParserException(
                    f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
                    f"Got JSON string: {json_string}"
                )
            except NameError as exc:
                raise ImportError("Please pip install PyYAML.") from exc

        if isinstance(json_obj, dict):
            json_objects.append(json_obj)
        elif isinstance(json_obj, list):
            json_objects.extend(json_obj)

    if not json_objects:
        raise ValueError(f"Failed to convert output to JSON: {output!r}")

    json_output = self._format_output(json_objects)
    answers = [Answer.from_dict(json_dict) for json_dict in json_output]
    return StructuredOutput(raw_output=output, parsed_output=answers)

This approach ensures that each JSON object in the stream is parsed individually, which should help resolve the OutputParserException error [1][2].

To continue talking to Dosu, mention @dosu.