Failed to parse Suggestions from completion

daniau23 commented 2 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[x] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The following is the code when using PydanticOutputParser as Langchain fails to parse LLM output


HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

repo_id = "mistralai/Mistral-7B-Instruct-v0.3"

model_kwargs = {
    "max_new_tokens": 60, 
    "max_length": 200, 
    "temperature": 0.1, 
    "timeout": 6000
}

# Using HuggingFaceHub
llm = HuggingFaceHub(
    repo_id=repo_id,
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN,
    model_kwargs = model_kwargs,
)

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")

    # Throw error in case of receiving a numbered-list from API
    @field_validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Offer a list of suggestions to substitute the specified target_word based on the context.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

model_input = prompt.format_prompt(
            target_word="behaviour",
            context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

output = llm(model_input.to_string())

parser.parse(output)

When trying to fix the error using OutputFixingParser another error was experienced below is the codebase

outputfixing_parser = OutputFixingParser.from_llm(parser=parser,llm=llm)
print(outputfixing_parser)
outputfixing_parser.parse(output)

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33), in PydanticOutputParser._parse_obj(self, obj)
     [32](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:32) if issubclass(self.pydantic_object, pydantic.BaseModel):
---> [33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33)     return self.pydantic_object.model_validate(obj)
     [34](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:34) elif issubclass(self.pydantic_object, pydantic.v1.BaseModel):

File [~\Desktop\llmai\llm_deep\Lib\site-packages\pydantic\main.py:551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551), in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    [550](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:550) __tracebackhide__ = True
--> [551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551) return cls.__pydantic_validator__.validate_python(
    [552](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:552)     obj, strict=strict, from_attributes=from_attributes, context=context
    [553](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:553) )

ValidationError: 1 validation error for Suggestions
words
  Field required [type=missing, input_value={'properties': {'words': ..., 'required': ['words']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[284], [line 1](vscode-notebook-cell:?execution_count=284&line=1)
----> [1](vscode-notebook-cell:?execution_count=284&line=1) parser.parse(output)

File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:64](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:64), in PydanticOutputParser.parse(self, text)
...

OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"description": "list of substitute words based on context", "items": {"type": "string"}, "title": "Words", "type": "array"}}, "required": ["words"]}. Got: 1 validation error for Suggestions
words
  Field required [type=missing, input_value={'properties': {'words': ..., 'required': ['words']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing

Error when using OutputFixingParser

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:33](~\Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33), in PydanticOutputParser._parse_obj(self, obj)
     [32](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:32) if issubclass(self.pydantic_object, pydantic.BaseModel):
---> [33](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:33)     return self.pydantic_object.model_validate(obj)
     [34](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:34) elif issubclass(self.pydantic_object, pydantic.v1.BaseModel):

File [~\Desktop\llmai\llm_deep\Lib\site-packages\pydantic\main.py:551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551), in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    [550](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:550) __tracebackhide__ = True
--> [551](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:551) return cls.__pydantic_validator__.validate_python(
    [552](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:552)     obj, strict=strict, from_attributes=from_attributes, context=context
    [553](~/Desktop/llmai/llm_deep/Lib/site-packages/pydantic/main.py:553) )

ValidationError: 1 validation error for Suggestions
  Input should be a valid dictionary or instance of Suggestions [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.7/v/model_type

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[265], [line 1](vscode-notebook-cell:?execution_count=265&line=1)
----> [1](vscode-notebook-cell:?execution_count=265&line=1) outputfixing_parser.parse(output)

File [~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\fix.py:62](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain/output_parsers/fix.py:62), in OutputFixingParser.parse(self, completion)
     [60](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain/output_parsers/fix.py:60) except OutputParserException as e:
...
     [44](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:44)     try:

OutputParserException: Failed to parse Suggestions from completion null. Got: 1 validation error for Suggestions
  Input should be a valid dictionary or instance of Suggestions [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.7/v/model_type

Description

Output must be able to parse LLM output and extract the json produced as shown below;

Suggestions(words=["conduct", "misconduct", "actions", "antics", "performance", "demeanor", "attitude", "behavior", "manner", "pupil actions"])

System Info

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:27:10) [MSC v.1938 64 bit (AMD64)]

Package Information

langchain_core: 0.2.4 langchain: 0.2.2 langchain_community: 0.2.4 langsmith: 0.1.73 langchain_google_community: 1.0.5 langchain_huggingface: 0.0.3 langchain_text_splitters: 0.2.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

keenborder786 commented 2 months ago

Please try the following:


from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.pydantic_v1 import BaseModel,Field,validator
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from typing import List

repo_id = "mistralai/Mistral-7B-Instruct-v0.3"

model_kwargs = {
    "max_new_tokens": 100, 
    "max_length": 200, 
    "temperature": 0.1, 
    "timeout": 6000
}

# Using HuggingFaceHub
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    huggingfacehub_api_token = '',
    task = 'conversational',
    **model_kwargs,

)

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")

    # Throw error in case of receiving a numbered-list from API
    @validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Please output your answer as a JSON following the below instruction:
{format_instructions}

Offer a list of suggestions to substitute the specified target_word based on the context.

target_word={target_word}
context={context}

"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

chain = prompt | llm

response = chain.invoke({'target_word':'behaviour',
              'context':'The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.'})
print(parser.parse(response))

Updated the Prompt
Use the updated HuggingFaceEndpoint rather than the deprecated HuggingFaceHub
Change the task type
Use the new LCEL style

daniau23 commented 2 months ago

@keenborder786 Many thanks for you help, I used your approach and it worked. I did make some notice that the original prompt still works with your approach

prompt_template = """
Offer a list of suggestions to substitute the specified target_word based on the context.
{format_instructions}
target_word={target_word}
context={context}
"""

Also noticed that the task didn't need to be specified but still worked. I did update the import module from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint to from langchain_huggingface import HuggingFaceEndpoint Based on the warning given as shown below

LangChainDeprecationWarning: The class `HuggingFaceEndpoint` was deprecated in LangChain 0.0.37 and will be removed in 0.3. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEndpoint`.

I do have some questions for fixing output parsing issues

Misformatted output type-1

This involved a misspelling given in the output

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
parser.parse(missformatted_output)

Error given

ValidationError: 1 validation error for Suggestions
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[57], [line 2](vscode-notebook-cell:?execution_count=57&line=2)
      [1](vscode-notebook-cell:?execution_count=57&line=1) missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
...
     [44](~/Desktop/llmai/llm_deep/Lib/site-packages/langchain_core/output_parsers/pydantic.py:44)     try:

OutputParserException: Failed to parse Suggestions from completion {"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}. Got: 1 validation error for Suggestions
reasons
  field required (type=value_error.missing)

Using OutputFixingParser to fix but failed

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)
outputfixing_parser.parse(missformatted_output)

Error Given

ValidationError: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
...
OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"title": "Words", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}, "reasons": {"title": "Reasons", "description": "the reasoning of why this word fits the context", "type": "array", "items": {"type": "string"}}}, "required": ["words", "reasons"]}. Got: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

Misformatted output type-2

This involved a missing key in the output

missformatted_output = '{"words": ["conduct", "manner"]}'
outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)
outputfixing_parser.parse(missformatted_output)

Error Given

ValidationError: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
...
OutputParserException: Failed to parse Suggestions from completion {"properties": {"words": {"title": "Words", "description": "list of substitute words based on context", "type": "array", "items": {"type": "string"}}, "reasons": {"title": "Reasons", "description": "the reasoning of why this word fits the context", "type": "array", "items": {"type": "string"}}}, "required": ["words", "reasons"]}. Got: 2 validation errors for Suggestions
words
  field required (type=value_error.missing)
reasons
  field required (type=value_error.missing)

Misformatted output type-2 could be solved by using RetryWithErrorOutputParser. Below is the codebase

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitute words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

prompt_template = """
Offer a list of suggestions to substitue the specified target_word based the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt_input_variables = ["target_word", "context"]
partial_variables = {"format_instructions":parser.get_format_instructions()}
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=prompt_input_variables,
    partial_variables=partial_variables
)

model_input = prompt.format_prompt(
            target_word="behaviour",
            context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

missformatted_output = '{"words": ["conduct", "manner"]}'
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=llm)
retry_parser.parse_with_prompt(missformatted_output, model_input)

Output

Suggestions(words=['conduct', 'manner'], reasons=['Conduct refers to the way the students are acting in the classroom, which is disruptive. Manner refers to the way they are behaving, which is not appropriate for a lesson.']

But with the use of RetryWithErrorOutputParser for Misformatted output type-1 gave an error below; Error code

JSONDecodeError                           Traceback (most recent call last)
File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:66, in JsonOutputParser.parse_result(self, result, partial)
     65 try:
---> 66     return parse_json_markdown(text)
     67 except JSONDecodeError as e:

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:147, in parse_json_markdown(json_string, parser)
    146         json_str = match.group(2)
--> 147 return _parse_json(json_str, parser=parser)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:160, in _parse_json(json_str, parser)
    159 # Parse the JSON string into a Python dictionary
--> 160 return parser(json_str)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\utils\json.py:120, in parse_partial_json(s, strict)
    117 # If we got here, we ran out of characters to remove
    118 # and still couldn't parse the string as JSON, so return the parse error
    119 # for the original string.
--> 120 return json.loads(s, strict=strict)

File ~\Desktop\llmai\llm_deep\Lib\json\__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    358     kw['parse_constant'] = parse_constant
--> 359 return cls(**kw).decode(s)

File ~\Desktop\llmai\llm_deep\Lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~\Desktop\llmai\llm_deep\Lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

OutputParserException                     Traceback (most recent call last)
Cell In[112], line 3
      1 missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'
      2 retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=llm)
----> 3 retry_parser.parse_with_prompt(missformatted_output, model_input)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\retry.py:190, in RetryWithErrorOutputParser.parse_with_prompt(self, completion, prompt_value)
    188 except OutputParserException as e:
    189     if retries == self.max_retries:
--> 190         raise e
    191     else:
    192         retries += 1

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain\output_parsers\retry.py:187, in RetryWithErrorOutputParser.parse_with_prompt(self, completion, prompt_value)
    185 while retries <= self.max_retries:
    186     try:
--> 187         return self.parser.parse(completion)
    188     except OutputParserException as e:
    189         if retries == self.max_retries:

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:64, in PydanticOutputParser.parse(self, text)
     63 def parse(self, text: str) -> TBaseModel:
---> 64     return super().parse(text)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:72, in JsonOutputParser.parse(self, text)
     71 def parse(self, text: str) -> Any:
---> 72     return self.parse_result([Generation(text=text)])

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\pydantic.py:60, in PydanticOutputParser.parse_result(self, result, partial)
     57 def parse_result(
     58     self, result: List[Generation], *, partial: bool = False
     59 ) -> TBaseModel:
---> 60     json_object = super().parse_result(result)
     61     return self._parse_obj(json_object)

File ~\Desktop\llmai\llm_deep\Lib\site-packages\langchain_core\output_parsers\json.py:69, in JsonOutputParser.parse_result(self, result, partial)
     67 except JSONDecodeError as e:
     68     msg = f"Invalid json output: {text}"
---> 69     raise OutputParserException(msg, llm_output=text) from e

OutputParserException: Invalid json output: target_word=behaviour
context=The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.

Completion:
{"words": ["actions", "conduct"], "reasons": ["refers to the things that someone does in a particular situation.", "refers to the way someone acts in a particular situation."]}

Above, the Completion satisfied the constraints given in the Prompt.
Details: OutputParserException('Failed to parse Suggestions from completion {"words": ["actions", "conduct"], "reasons": ["refers to the things that someone does in a particular situation.", "refers to the way someone acts in a particular situation."]}. Got no validation errors'

Question How can Misformatted output type-1 be solved in this scenario?

langchain-ai / langchain