langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.29k stars 14.52k forks source link

Failed to parse Suggestions from completion #22952

Open daniau23 opened 2 months ago

daniau23 commented 2 months ago

Checked other resources

Example Code

The following is the code when using PydanticOutputParser as Langchain fails to parse LLM output


HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

repo_id = "mistralai/Mistral-7B-Instruct-v0.3"

model_kwargs = {
    "max_new_tokens": 60, 
    "max_length": 200, 
    "temperature": 0.1, 
    "timeout": 6000
}

# Using HuggingFaceHub
llm = HuggingFaceHub(
    repo_id=repo_id,
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN,
    model_kwargs = model_kwargs,
)

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")

    # Throw error in case of receiving a numbered-list from API
    @field_validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Offer a list of suggestions to substitute the specified target_word based on the context.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

model_input = prompt.format_prompt(
            target_word="behaviour",
            context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

output = llm(model_input.to_string())

parser.parse(output)

When trying to fix the error using OutputFixingParser another error was experienced below is the codebase

outputfixing_parser = OutputFixingParser.from_llm(parser=parser,llm=llm)
print(outputfixing_parser)
outputfixing_parser.parse(output)

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33), in PydanticOutputParser._parse_obj(self, obj)
     [32](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:32) if issubclass(self.pydantic_object, pydantic.BaseModel):
---> [33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33)     return self.pydantic_object.model_validate(obj)
     [34](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:34) elif issubclass(self.pydantic_object, pydantic.v1.BaseModel):

File [~\Desktop\llmai\llm_deep\Lib\site-packages\pydantic\main.py:551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551), in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    [550](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:550) __tracebackhide__ = True
--> [551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551) return cls.__pydantic_validator__.validate_python(
    [552](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:552)     obj, strict=strict, from_attributes=from_attributes, context=context
    [553](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:553) )

ValidationError: 1 validation error for Suggestions
words
  Field required [type=missing, input_value={'properties': {'words': ..., 'required': ['words']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[284], [line 1](vscode-notebook-cell:?execution_count=284&line=1)
----> [1](vscode-notebook-cell:?execution_count=284&line=1) parser.parse(output)

File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:64](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:64), in PydanticOutputParser.parse(self, text)
...

OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"description": "list of substitute words based on context", "items": {"type": "string"}, "title": "Words", "type": "array"}}, "required": ["words"]}. Got: 1 validation error for Suggestions
words
  Field required [type=missing, input_value={'properties': {'words': ..., 'required': ['words']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing

Error when using OutputFixingParser

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:33](~\Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33), in PydanticOutputParser._parse_obj(self, obj)
     [32](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:32) if issubclass(self.pydantic_object, pydantic.BaseModel):
---> [33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33)     return self.pydantic_object.model_validate(obj)
     [34](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:34) elif issubclass(self.pydantic_object, pydantic.v1.BaseModel):

File [~\Desktop\llmai\llm_deep\Lib\site-packages\pydantic\main.py:551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551), in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    [550](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:550) __tracebackhide__ = True
--> [551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551) return cls.__pydantic_validator__.validate_python(
    [552](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:552)     obj, strict=strict, from_attributes=from_attributes, context=context
    [553](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:553) )

ValidationError: 1 validation error for Suggestions
  Input should be a valid dictionary or instance of Suggestions [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.7/v/model_type

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[265], [line 1](vscode-notebook-cell:?execution_count=265&line=1)
----> [1](vscode-notebook-cell:?execution_count=265&line=1) outputfixing_parser.parse(output)

File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\fix.py:62](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain/output_parsers/fix.py:62), in OutputFixingParser.parse(self, completion)
     [60](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain/output_parsers/fix.py:60) except OutputParserException as e:
...
     [44](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:44)     try:

OutputParserException: Failed to parse Suggestions from completion null. Got: 1 validation error for Suggestions
  Input should be a valid dictionary or instance of Suggestions [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.7/v/model_type

Description

Output must be able to parse LLM output and extract the json produced as shown below;

Suggestions(words=["conduct", "misconduct", "actions", "antics", "performance", "demeanor", "attitude", "behavior", "manner", "pupil actions"])

System Info

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)]

Package Information

langchain_core: 0.2.4 langchain: 0.2.2 langchain_community: 0.2.4 langsmith: 0.1.73 langchain_google_community: 1.0.5 langchain_huggingface: 0.0.3 langchain_text_splitters: 0.2.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

keenborder786 commented 2 months ago

Please try the following:


from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.pydantic_v1 import BaseModel,Field,validator
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from typing import List

repo_id = "mistralai/Mistral-7B-Instruct-v0.3"

model_kwargs = {
    "max_new_tokens": 100, 
    "max_length": 200, 
    "temperature": 0.1, 
    "timeout": 6000
}

# Using HuggingFaceHub
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    huggingfacehub_api_token = '',
    task = 'conversational',
    **model_kwargs,

)

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")

    # Throw error in case of receiving a numbered-list from API
    @validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Please output your answer as a JSON following the below instruction:
{format_instructions}

Offer a list of suggestions to substitute the specified target_word based on the context.

target_word={target_word}
context={context}

"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

chain = prompt | llm

response = chain.invoke({'target_word':'behaviour',
              'context':'The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.'})
print(parser.parse(response))
daniau23 commented 2 months ago

@keenborder786 Many thanks for you help, I used your approach and it worked. I did make some notice that the original prompt still works with your approach

prompt_template = """
Offer a list of suggestions to substitute the specified target_word based on the context.
{format_instructions}
target_word={target_word}
context={context}
"""

Also noticed that the task didn't need to be specified but still worked. I did update the import module from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint to from langchain_huggingface import HuggingFaceEndpoint Based on the warning given as shown below

LangChainDeprecationWarning: The class `HuggingFaceEndpoint` was deprecated in LangChain 0.0.37 and will be removed in 0.3. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEndpoint`.

I do have some questions for fixing output parsing issues

Misformatted output type-1

This involved a misspelling given in the output

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
parser.parse(missformatted_output)

Error given

ValidationError: 1 validation error for Suggestions
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[57], [line 2](vscode-notebook-cell:?execution_count=57&line=2)
      [1](vscode-notebook-cell:?execution_count=57&line=1) missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
...
     [44](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:44)     try:

OutputParserException: Failed to parse Suggestions from completion {"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}. Got: 1 validation error for Suggestions
reasons
  field required (type=value_error.missing)

Using OutputFixingParser to fix but failed

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)
outputfixing_parser.parse(missformatted_output)

Error Given

ValidationError: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
...
OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"title": "Words", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}, "reasons": {"title": "Reasons", "description": "the reasoning of why this word fits the context", "type": "array", "items": {"type": "string"}}}, "required": ["words", "reasons"]}. Got: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

Misformatted output type-2

This involved a missing key in the output

missformatted_output = '{"words": ["conduct", "manner"]}'
outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)
outputfixing_parser.parse(missformatted_output)

Error Given

ValidationError: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
...
OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"title": "Words", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}, "reasons": {"title": "Reasons", "description": "the reasoning of why this word fits the context", "type": "array", "items": {"type": "string"}}}, "required": ["words", "reasons"]}. Got: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

Misformatted output type-2 could be solved by using RetryWithErrorOutputParser. Below is the codebase

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Offer a list of suggestions to substitue the specified target_word based the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

model_input = prompt.format_prompt(
            target_word="behaviour",
            context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

missformatted_output = '{"words": ["conduct", "manner"]}'
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=llm)
retry_parser.parse_with_prompt(missformatted_output, model_input)

Output

Suggestions(words=['conduct', 'manner'], reasons=['Conduct refers to the way the students are acting in the classroom, which is disruptive. Manner refers to the way they are behaving, which is not appropriate for a lesson.']

But with the use of RetryWithErrorOutputParser for Misformatted output type-1 gave an error below; Error code

JSONDecodeError                           Traceback (most recent call last)
File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:66, in JsonOutputParser.parse_result(self, result, partial)
     65 try:
---> 66     return parse_json_markdown(text)
     67 except JSONDecodeError as e:

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:147, in parse_json_markdown(json_string, parser)
    146         json_str = match.group(2)
--> 147 return _parse_json(json_str, parser=parser)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:160, in _parse_json(json_str, parser)
    159 # Parse the JSON string into a Python dictionary
--> 160 return parser(json_str)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:120, in parse_partial_json(s, strict)
    117 # If we got here, we ran out of characters to remove
    118 # and still couldn't parse the string as JSON, so return the parse error
    119 # for the original string.
--> 120 return json.loads(s, strict=strict)

File ~\Desktop\llmai\llm_deep\Lib\json\__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    358     kw['parse_constant'] = parse_constant
--> 359 return cls(**kw).decode(s)

File ~\Desktop\llmai\llm_deep\Lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~\Desktop\llmai\llm_deep\Lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

OutputParserException                     Traceback (most recent call last)
Cell In[112], line 3
      1 missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
      2 retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=llm)
----> 3 retry_parser.parse_with_prompt(missformatted_output, model_input)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\retry.py:190, in RetryWithErrorOutputParser.parse_with_prompt(self, completion, prompt_value)
    188 except OutputParserException as e:
    189     if retries == self.max_retries:
--> 190         raise e
    191     else:
    192         retries += 1

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\retry.py:187, in RetryWithErrorOutputParser.parse_with_prompt(self, completion, prompt_value)
    185 while retries <= self.max_retries:
    186     try:
--> 187         return self.parser.parse(completion)
    188     except OutputParserException as e:
    189         if retries == self.max_retries:

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:64, in PydanticOutputParser.parse(self, text)
     63 def parse(self, text: str) -> TBaseModel:
---> 64     return super().parse(text)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:72, in JsonOutputParser.parse(self, text)
     71 def parse(self, text: str) -> Any:
---> 72     return self.parse_result([Generation(text=text)])

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:60, in PydanticOutputParser.parse_result(self, result, partial)
     57 def parse_result(
     58     self, result: List[Generation], *, partial: bool = False
     59 ) -> TBaseModel:
---> 60     json_object = super().parse_result(result)
     61     return self._parse_obj(json_object)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:69, in JsonOutputParser.parse_result(self, result, partial)
     67 except JSONDecodeError as e:
     68     msg = f"Invalid json output: {text}"
---> 69     raise OutputParserException(msg, llm_output=text) from e

OutputParserException: Invalid json output: target_word=behaviour
context=The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.

Completion:
{"words": ["actions", "conduct"], "reasons": ["refers to the things that someone does in a particular situation.", "refers to the way someone acts in a particular situation."]}

Above, the Completion satisfied the constraints given in the Prompt.
Details: OutputParserException('Failed to parse Suggestions from completion {"words": ["actions", "conduct"], "reasons": ["refers to the things that someone does in a particular situation.", "refers to the way someone acts in a particular situation."]}. Got no validation errors'

Question How can Misformatted output type-1 be solved in this scenario?