Closed dimartinez closed 2 months ago
@dimartinez Hello! I'm here to help you with any issues you're experiencing. If you need assistance with bug fixes, questions, or becoming a contributor, feel free to ask.
I found a similar unsolved discussion regarding a JSON conversion error in the RouterQueryEngine
when using the gpt-4o-mini
model. You can check it out here [1].
The error you're encountering seems to be related to the output_parser
in the LLMSingleSelector
. The output_parser
is responsible for converting the output into a structured JSON format. If the output is not in the expected format, it raises a ValueError
.
Here's a brief overview of how the output_parser
handles JSON conversion:
_marshal_llm_to_json
function.pyyaml
as a fallback to handle less strict JSON formats.Answer
objects.Here is the relevant code snippet:
def parse(self, output: str) -> Any:
json_string = _marshal_llm_to_json(output)
try:
json_obj = json.loads(json_string)
except json.JSONDecodeError as e_json:
try:
import yaml
json_obj = yaml.safe_load(json_string)
except yaml.YAMLError as e_yaml:
raise OutputParserException(
f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
f"Got JSON string: {json_string}"
)
except NameError as exc:
raise ImportError("Please pip install PyYAML.") from exc
if isinstance(json_obj, dict):
json_obj = [json_obj]
if not json_obj:
raise ValueError(f"Failed to convert output to JSON: {output!r}")
json_output = self._format_output(json_obj)
answers = [Answer.from_dict(json_dict) for json_dict in json_output]
return StructuredOutput(raw_output=output, parsed_output=answers)
The issue might be that the gpt-4o-mini
model is returning an empty list ([]
), which is not being handled correctly by the output_parser
. You might want to add additional checks or handle empty outputs explicitly in your code.
For further investigation, you can refer to the output_parser
implementation in the selection.py
file [2][3].
To continue talking to Dosu, mention @dosu.
@dosubot it will works ok if use PydanticSingleSelector
Bug Description
If you use 'gpt-4o-mini' with a specific questions responded by VectorStorageIndex you will get a error about convert output to JSON when is used throw RouterQueryEngine. It don't be throw if you use gpt-4o model.
Version
0.11.3
Steps to Reproduce
from http.client import responses
from llama_index.core import ( SimpleDirectoryReader, Settings, SummaryIndex, VectorStoreIndex ) from llama_index.core.tools import QueryEngineTool from llama_index.core.query_engine.router_query_engine import RouterQueryEngine from llama_index.core.selectors.llm_selectors import LLMSingleSelector from llama_index.llms.openai import OpenAI from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.core.node_parser import SentenceSplitter
Settings.llm = OpenAI(model="gpt-4o-mini") Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader( input_files=['a pdf file... fails with any document...'] ).load_data()
splitter = SentenceSplitter(chunk_size=1024) nodes = splitter.get_nodes_from_documents(documents)
summary_index = SummaryIndex(nodes) vector_index = VectorStoreIndex(nodes)
summary_query_engine = summary_index.as_query_engine( response_mode="tree_summarize", use_async=True )
vector_query_engine = vector_index.as_query_engine()
summary_tool = QueryEngineTool.from_defaults( query_engine=summary_query_engine, description=( "Useful for respond to summarizations questions about bla bla" )
)
vector_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine, description=( "Useful for respond to specific questions about bla bla" ) )
query_engine = RouterQueryEngine( selector=LLMSingleSelector.from_defaults(), query_engine_tools=[ summary_tool, vector_tool ], verbose=True )
response = query_engine.query("What's a tech radar?") print(str(response))
Relevant Logs/Tracbacks