Closed Tamaya31 closed 4 months ago
It looks like I fixed it by changing pydantic_model.model_json_schema()
to pydantic_model.model_json_schema().get('properties', {})
if you think that is a good fix
Hi @Tamaya31
thank you for the issue and the PR. Unfortunately, I am on vacation the next two weeks without my laptop 🙈 I really would like to look into it and make some tests before merging the PR. Would be good, if this improves the default_system_prompt😊. But I want to test it with different LLMs. I hope it is okay, when this has to wait until I am back from vacation.
Maybe you could post the outputs of pydantic_model.model_json_schema()
and pydantic_model.model_json_schema().get('properties', {})
for your example?
And the next question: what LLM/s are you using? In most cases having your own system prompt for your specific use case and LLM, is giving you better results in the responses (my personal experience).
Hello @lennartpollvogt
You are welcome. No worries for the delay, for now I use my fork with the fix, that works good.
To answer your questions: This is the model I am using that illustrate the issue:
class Answer(BaseModel): observation: str recommendation: str
With your original code, it returns this format:
'Answer': {'a': 'A sentence', 'b': "Another sentence"}}
This then cannot pass the validation as the parent node is not returned from the LLM, and anyways I don't want it to return it.
With the fix I propose, it returns this:
'a': 'A sentence', 'b': "Another sentence"
I used it against mistral:7b, but I can quickly test it against llama3 or any other model.
Let me know if that helps, I am happy to test for you while you are away if needed.
Hi, sorry for the late response and thank you very much for the clarification. I think I will test it when I am back, since there is one thing I would like to know: what happens to the docstring of the pydantic model? I think this will not be covered. But when your approach works better for a greater number of LLMs, then it is worth it. In this case I will have to cover the docstring differently. 😊
Hi @Tamaya31 ,
I finally had time to investigate your proposal. Next to my guess of the missing "description" I found another issue which would lead to inaccurate responses. When only getting the "properties" out of the JSON schema the nested models would miss the references ('$defs').
Here is an example:
from pydantic import BaseModel, ConfigDict
from enum import Enum
from typing import List
import rich
class Gender(Enum):
'''
Gender of the person.
'''
MALE = 'male'
FEMALE = 'female'
class Person(BaseModel):
'''
This is a person.
'''
name: str
age: int
gender: Gender
friends: List[str] = []
model_config = ConfigDict(
extra='forbid'
)
rich.print(Person.model_json_schema())
print('###################################')
rich.print(Person.model_json_schema().get('properties'))
Output looks like this:
{
'$defs': {
'Gender': {
'description': 'Gender of the person.',
'enum': ['male', 'female'],
'title': 'Gender',
'type': 'string'
}
},
'additionalProperties': False,
'description': 'This is a person.',
'properties': {
'name': {'title': 'Name', 'type': 'string'},
'age': {'title': 'Age', 'type': 'integer'},
'gender': {'$ref': '#/$defs/Gender'},
'friends': {'default': [], 'items': {'type': 'string'}, 'title': 'Friends', 'type': 'array'}
},
'required': ['name', 'age', 'gender'],
'title': 'Person',
'type': 'object'
}
###################################
{
'name': {'title': 'Name', 'type': 'string'},
'age': {'title': 'Age', 'type': 'integer'},
'gender': {'$ref': '#/$defs/Gender'},
'friends': {'default': [], 'items': {'type': 'string'}, 'title': 'Friends', 'type': 'array'}
}
This leads me to stick to the current default system prompt.
But, next to your solution of using a fork, you could simply come with your own system prompt in the "messages" list. I mention this in the docs (see here).
I hope this helps
I am encountering a ValidationError that seems to be coming from the fact that at times the output also contains the parent class name, for example if this was my model:
class Answer(BaseModel): a: str b: str
I'd get this output:
'Answer': {'a': 'A sentence', 'b': "Another sentence"}}