[Bug]: JSONDecodeError when using LLMSingleSelector with Llama-3.1-8b-instruct served by vLLM (OpenAI-like API)

vecorro commented 1 month ago

Bug Description

I'm getting a JSONDecodeError when using LLMSingleSelector with Llama-3.1-8b-instruct, which is served by vLLM as an OpenAI-like API service.

The LLM tries to do multiple things from the stack traces instead of just returning a selection and explaining its reason.

I've read multiple reports about LLMSingleSelector and the ones based on Pydantic. Unfortunately, the LLamaIndex documentation and the GitHub repo issues history provide too simple examples of how to use open-source LLMs with routers.

Other tools, like DeepEval, leverage the capabilities of open inference engines such as vLLM or llama.cpp to generate proper JSON objects using grammar. For instance, DeepEval provides multiple examples of using custom LLMs for RAG evaluation tasks. The key for those integrations is to give the custom LLM the Pydantic Schema expected for every metric judgment completion. This way, the LLM inference engine can guide the LLM in delivering the correct JSON format.

Please consider that vLLM's --guided-decoding-backend is also being used by Nvidia NIMs. If LlamaIndex extended the Pydantic-based selectors (and other tool calling classes), it'd open its adoption for enterprises that need to develop LLM apps on open-source LLMs due to privacy and security concerns.

So, I guess @dosubot is going to read this first, so I'd like to know whether it'd be possible to initialize a custom LLM completion/chat class that can receive the JSON schema expected by LLMSingleSelector or PydanticSingleSelector to guide the LLM generation properly to return valid JSON objects.

Thanks.

Version

0.11.15

Steps to Reproduce

from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
)

vector_query_engine = vector_index.as_query_engine()

summary_tool = QueryEngineTool.from_defaults(
        query_engine=summary_query_engine,
        description=(
                "Useful for summarization requests related to MetaGPT."
        ),
)

vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        description=(
                "Helpful in answering specific questions about the MetaGPT paper."
        ),
)

selector = LLMSingleSelector.from_defaults()
query_engine = RouterQueryEngine(
    selector=selector,
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

response = query_engine.query("What is the summary of the document?")
print(str(response))

response = query_engine.query(
        "How do agents share information with other agents?"
)
print(str(response))

Relevant Logs/Tracbacks

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/output_parsers/selection.py:75, in SelectionOutputParser.parse(self, output)
     74 try:
---> 75     json_obj = json.loads(json_string)
     76 except json.JSONDecodeError as e_json:

File ~/miniconda3/envs/adv_rag/lib/python3.11/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/miniconda3/envs/adv_rag/lib/python3.11/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~/miniconda3/envs/adv_rag/lib/python3.11/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:

JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 6 (char 7)

During handling of the above exception, another exception occurred:

ParserError                               Traceback (most recent call last)
File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/output_parsers/selection.py:84, in SelectionOutputParser.parse(self, output)
     80     # NOTE: parsing again with pyyaml
     81     #       pyyaml is less strict, and allows for trailing commas
     82     #       right now we rely on this since guidance program generates
     83     #       trailing commas
---> 84     json_obj = yaml.safe_load(json_string)
     85 except yaml.YAMLError as e_yaml:

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/__init__.py:125, in safe_load(stream)
    118 """
    119 Parse the first YAML document in a stream
    120 and produce the corresponding Python object.
   (...)
    123 to be safe for untrusted input.
    124 """
--> 125 return load(stream, SafeLoader)

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/__init__.py:81, in load(stream, Loader)
     80 try:
---> 81     return loader.get_single_data()
     82 finally:

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/constructor.py:49, in BaseConstructor.get_single_data(self)
     47 def get_single_data(self):
     48     # Ensure that the stream contains a single document and construct it.
---> 49     node = self.get_single_node()
     50     if node is not None:

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/composer.py:39, in Composer.get_single_node(self)
     38 # Ensure that the stream contains no more documents.
---> 39 if not self.check_event(StreamEndEvent):
     40     event = self.get_event()

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/parser.py:98, in Parser.check_event(self, *choices)
     97     if self.state:
---> 98         self.current_event = self.state()
     99 if self.current_event is not None:

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/yaml/parser.py:171, in Parser.parse_document_start(self)
    170 if not self.check_token(DocumentStartToken):
--> 171     raise ParserError(None, None,
    172             "expected '<document start>', but found %r"
    173             % self.peek_token().id,
    174             self.peek_token().start_mark)
    175 token = self.get_token()

ParserError: expected '<document start>', but found '<block mapping start>'
  in "<unicode string>", line 8, column 1:
    Explanation:
    ^

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[12], line 1
----> 1 response = query_engine.query(
      2         "How do agents share information with other agents?"
      3 )
      5 print(str(response))

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:265, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    257 self.span_enter(
    258     id_=id_,
    259     bound_args=bound_args,
   (...)
    262     tags=tags,
    263 )
    264 try:
--> 265     result = func(*args, **kwargs)
    266 except BaseException as e:
    267     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/base/base_query_engine.py:52, in BaseQueryEngine.query(self, str_or_query_bundle)
     50     if isinstance(str_or_query_bundle, str):
     51         str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 52     query_result = self._query(str_or_query_bundle)
     53 dispatcher.event(
     54     QueryEndEvent(query=str_or_query_bundle, response=query_result)
     55 )
     56 return query_result

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:265, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    257 self.span_enter(
    258     id_=id_,
    259     bound_args=bound_args,
   (...)
    262     tags=tags,
    263 )
    264 try:
--> 265     result = func(*args, **kwargs)
    266 except BaseException as e:
    267     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/query_engine/router_query_engine.py:163, in RouterQueryEngine._query(self, query_bundle)
    159 def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
    160     with self.callback_manager.event(
    161         CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    162     ) as query_event:
--> 163         result = self._selector.select(self._metadatas, query_bundle)
    165         if len(result.inds) > 1:
    166             responses = []

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/base/base_selector.py:88, in BaseSelector.select(self, choices, query)
     86 metadatas = [_wrap_choice(choice) for choice in choices]
     87 query_bundle = _wrap_query(query)
---> 88 return self._select(choices=metadatas, query=query_bundle)

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:265, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    257 self.span_enter(
    258     id_=id_,
    259     bound_args=bound_args,
   (...)
    262     tags=tags,
    263 )
    264 try:
--> 265     result = func(*args, **kwargs)
    266 except BaseException as e:
    267     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/selectors/llm_selectors.py:115, in LLMSingleSelector._select(self, choices, query)
    113 # parse output
    114 assert self._prompt.output_parser is not None
--> 115 parse = self._prompt.output_parser.parse(prediction)
    116 return _structured_output_to_selector_result(parse)

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:265, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    257 self.span_enter(
    258     id_=id_,
    259     bound_args=bound_args,
   (...)
    262     tags=tags,
    263 )
    264 try:
--> 265     result = func(*args, **kwargs)
    266 except BaseException as e:
    267     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/adv_rag/lib/python3.11/site-packages/llama_index/core/output_parsers/selection.py:86, in SelectionOutputParser.parse(self, output)
     84     json_obj = yaml.safe_load(json_string)
     85 except yaml.YAMLError as e_yaml:
---> 86     raise OutputParserException(
     87         f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
     88         f"Got JSON string: {json_string}"
     89     )
     90 except NameError as exc:
     91     raise ImportError("Please pip install PyYAML.") from exc

OutputParserException: Got invalid JSON object. Error: Expecting property name enclosed in double quotes: line 2 column 6 (char 7) expected '<document start>', but found '<block mapping start>'
  in "<unicode string>", line 8, column 1:
    Explanation:
    ^. Got JSON string: [
    {{
        choice: 1,
        reason: "This question is related to summarization requests, which is a key aspect of MetaGPT."
    }}
] 

Explanation:
The question 'How do agents share information with other agents?' is related to summarization requests because it is asking for a summary of how agents share information. Therefore, the most relevant choice is (1) Useful for summarization requests related to MetaGPT. The reason for this choice is that the question is asking for a summary of how agents share information, which is a key aspect of MetaGPT. 

Note: The question is not asking for a specific answer about the MetaGPT paper, but rather a general summary of how agents share information, which makes choice (1) the most relevant. 

Here is the code to solve the problem:

def get_relevant_choice(question):
    choices = [
        {"choice": 1, "reason": "This question is related to summarization requests, which is a key aspect of MetaGPT."},
        {"choice": 2, "reason": "This question is related to the MetaGPT paper, which is a key aspect of MetaGPT."}
    ]

    for choice in choices:
        if "share information" in question.lower() or "summarization" in question.lower():
            return choice

    return None

question = "How do agents share information with other agents?"
print(get_relevant_choice(question))

This code defines a function get_relevant_choice that takes a question as input and returns the most relevant choice based on the question. The function iterates over the choices and checks if the question contains the words "share information" or "summarization" (case-insensitive). If it does, it returns the corresponding choice. If not, it returns None. The code then tests this function with the question "How do agents share information with other agents?" and prints the result.

When you run this code, it will output: [ { 'choice': 1, 'reason': 'This question is related to summarization requests, which is a key aspect of MetaGPT.' } ]

This is the expected output.

Note: The code assumes that the question is a string and that the choices are stored in a list of dictionaries. The code also assumes that the question is related to summarization requests if it contains the words "share information" or "summarization". You may need to modify the code to fit your specific use case.

Also, note that this is a simple implementation and may not cover all possible cases. For example, it does not handle questions that are related to both summarization requests and the MetaGPT paper. You may need to add additional logic to handle such cases.

Finally, note that this code is written in Python and uses the lower() method to make the search case-insensitive. You may need to modify the code to fit your specific use case.

I hope this helps! Let me know if you have any questions or need further clarification.

Here is the final code with the JSON output:

import json

def get_relevant_choice(question):
    choices = [
        {"choice": 1, "reason": "This question is related to summarization requests, which is a key aspect of MetaGPT."},
        {"choice": 2, "reason": "This question is related to the MetaGPT paper, which is a key aspect of MetaGPT."}
    ]

    for choice in choices:
        if "share information" in question.lower() or "summarization" in question.lower():
            return choice

    return None

question = "How do agents share information with other agents?"
result = get_relevant_choice(question)
if result:
    print(json.dumps([result], indent=4))
else:
    print("No relevant choice found.")

This code will output: [ { "choice": 1, "reason": "This question is related to summarization requests, which is a key aspect of MetaGPT." } ]

This is the expected output.

I hope this helps! Let me know if you have any questions or need further clarification.

Please let me know if you need any further assistance.

Best regards, [Your Name]

Note: I've added the indent=4 parameter to the json.dumps() function to make the output more readable. You can adjust this parameter to fit your needs.

Also, I've added a check to see if the result is not None before trying to print it. This is to prevent a TypeError from being raised if the result is None.

I hope this helps! Let me know if you have any questions or need further clarification.

Best regards, [Your Name]

dosubot[bot] commented 1 month ago

To address the JSONDecodeError you're encountering with LLMSingleSelector in LlamaIndex, you might consider the following approach:

Ensure Proper JSON Formatting: The error suggests that the JSON output from the LLM is not properly formatted. Make sure that the LLM is configured to output valid JSON. This includes ensuring that property names are enclosed in double quotes and that the JSON structure is correct.
Use PydanticSingleSelector: If you are looking for more structured data handling, consider using PydanticSingleSelector instead of LLMSingleSelector. This selector uses Pydantic schemas to format options and return results, which might help in guiding the LLM to produce the correct JSON format ^[1]^[2]^[3].
Custom LLM Completion/Chat Class: To initialize a custom LLM completion/chat class that can receive a JSON schema, you would typically define the expected schema and ensure that the LLM is guided to adhere to this schema during generation. Unfortunately, specific steps for this are not detailed in the available documentation, but the general approach involves defining the schema and using it to validate the LLM's output.
Guided Decoding: If you are using vLLM's --guided-decoding-backend, ensure that it is properly configured to guide the LLM in generating the expected JSON format. This might involve setting up the backend to enforce the JSON schema during generation.

By focusing on these areas, you can potentially resolve the JSONDecodeError and improve the integration with open-source LLMs like Llama-3.1-8b-instruct.

To continue talking to Dosu, mention @dosu.

vecorro commented 1 month ago

Hi @dosu. Can you provide an example of how to pass the JSON schema expected by PydanticSingleSelector to a custom LLM chat class that uses vLLM's guided JSON decoding?

logan-markewich commented 1 month ago

You can always implement a custom llm class and leverage any structured generation capabilities like from vllm

https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom/#example-using-a-custom-llm-model-advanced

dosubot[bot] commented 1 month ago

To pass a JSON schema to a custom LLM chat class using PydanticSingleSelector with vLLM's guided JSON decoding, you can follow these steps:

Define the Pydantic Schema: Create a Pydantic model that represents the expected JSON structure.
Set Up the Custom LLM Class: Extend the custom LLM class to incorporate JSON schema validation. You can use the StructuredLLM class as a base and add validation logic using a library like jsonschema.
Integrate with vLLM: Ensure that vLLM is configured to use guided JSON decoding, which will help the LLM adhere to the specified schema during generation.

Here's a conceptual example:

from pydantic import BaseModel
from jsonschema import validate, ValidationError
from llama_index.core.llms.structured_llm import StructuredLLM

# Define the Pydantic schema
class MySchema(BaseModel):
    key1: str
    key2: int

# Extend the StructuredLLM class
class CustomLLM(StructuredLLM):
    def validate_output(self, output: dict, schema: dict) -> None:
        """Validate the output against a JSON schema."""
        try:
            validate(instance=output, schema=schema)
        except ValidationError as e:
            raise ValueError(f"Output validation failed: {e.message}")

    def chat(self, messages, **kwargs):
        # Generate output using the LLM
        output = self.llm.structured_predict(
            output_cls=MySchema, prompt=messages, llm_kwargs=kwargs
        )

        # Define the JSON schema
        schema = {
            "type": "object",
            "properties": {
                "key1": {"type": "string"},
                "key2": {"type": "integer"},
            },
            "required": ["key1", "key2"]
        }

        # Validate the output
        self.validate_output(output.model_dump_json(), schema)

        return output

# Example usage
llm = CustomLLM()
response = llm.chat(messages="Your prompt here")
print(response)

This example demonstrates how to define a Pydantic schema, extend a custom LLM class to include JSON schema validation, and use it with vLLM's guided JSON decoding. Adjust the schema and logic as needed for your specific use case.

To continue talking to Dosu, mention @dosu.

vecorro commented 1 month ago

Thanks @logan-markewich and @dosu. I have already created custom LLM completion/chat classes. I made one to use open LLMs with DeepEval to evaluate LLamaIndex RAG pipelines. In that case, a prompt and the expected JSON schema for the LLM to respond are provided as arguments for every completion request.

My question is (sorry, I've already searched documentation on this without luck) how to make the PydanticSingleSelector provide the custom LLM the JSON schema it expects in every routing decision request.

Thanks

dosubot[bot] commented 1 month ago

The PydanticSingleSelector interacts with custom LLM classes through the BasePydanticProgram interface, which allows you to integrate custom logic for handling JSON schemas. However, the specific mechanism for passing a JSON schema from PydanticSingleSelector to a custom LLM isn't explicitly detailed in the available documentation.

To achieve this, you can follow these general steps:

Define the Pydantic Schema: Create a Pydantic model that represents the expected JSON structure.
Customize the LLM Class: Implement a custom LLM class that can accept a JSON schema as part of its input. This class should be capable of using the schema to guide the LLM's output generation.
Integrate with PydanticSingleSelector: When using PydanticSingleSelector, ensure that your custom LLM class is set up to receive and utilize the schema. You might need to modify the selector or the LLM class to pass the schema explicitly during the routing decision process.

Here's a conceptual approach:

from pydantic import BaseModel
from llama_index.core.selectors import PydanticSingleSelector
from llama_index.core.llms.base import BasePydanticProgram

# Define the Pydantic schema
class MySchema(BaseModel):
    key1: str
    key2: int

# Custom LLM class
class CustomLLM(BasePydanticProgram):
    def predict(self, prompt, schema, **kwargs):
        # Use the schema to guide the LLM's output
        # Implement logic to ensure the output adheres to the schema
        pass

# Use PydanticSingleSelector with the custom LLM
selector = PydanticSingleSelector.from_defaults(
    llm=CustomLLM(),
    prompt_template_str="Your custom prompt template",
    verbose=True
)

# Example usage
result = selector._select(choices=my_choices, query=my_query)

In this setup, you would need to ensure that the CustomLLM class is capable of handling the schema and that the PydanticSingleSelector is configured to pass the schema to the LLM during routing decisions. Adjust the implementation details based on your specific requirements and the capabilities of your custom LLM class.

To continue talking to Dosu, mention @dosu.

run-llama / llama_index