Closed pmalacho-mit closed 7 months ago
The issue you're experiencing with validation errors when using Partial
with nested models in your MeetingInfo
example seems to stem from how the model's fields are being handled for partial validation, especially when changing from a list of models to a single model instance. When you modify the MeetingInfo
model to use a single User
instead of a list, it's crucial to ensure that the User
model is correctly wrapped with Partial
to allow for partial validation, and that the field is marked as optional.
Here's how you might adjust your model definition to address this issue:
from pydantic import BaseModel
from typing import Optional
class User(BaseModel):
name: str
email: str
class MeetingInfo(BaseModel):
topic: str
participant: Optional[Partial[User]] # Assuming Partial is correctly implemented
In this adjusted example, participant
is a single User
model that is made optional and is correctly wrapped with Partial
, facilitating partial validation. This adjustment ensures that the single User
field is treated correctly for partial streaming scenarios.
For further implementation details, consider reviewing the Partial
class and the _make_field_optional
function within the instructor codebase, as these are key to handling partial streaming with nested models effectively.
Sorry @dosubot , I should've mentioned in my original post that that wrapping the field in Partial
actually does not fix the problem -- only marking the property as optional fixes it.
I see here from this comment that it's actually an explicit decision to not mark those fields as optional though so that the AI will treat them as required (or at least so I assume, neat!): https://github.com/jxnl/instructor/blob/main/instructor/dsl/partial.py#L235
It appears the core of the issue you're facing with partial streaming and validation errors in nested Pydantic models is tied to the handling of optional fields and the Partial
wrapper's behavior. Despite wrapping the nested model with Partial
, marking the property as optional seems to be the only effective solution you've found. This aligns with the explicit decision in the instructor codebase to not automatically mark these fields as optional to ensure they are treated as required by the AI.
To address this, you might consider explicitly marking nested model fields as optional in your BaseModel
definitions, even when using Partial
. This approach allows you to maintain the required validation behavior while also accommodating the partial data structures that are characteristic of streaming scenarios.
Here's a revised approach based on your findings and the instructor's implementation strategy:
from pydantic import BaseModel
from typing import Optional
from instructor.dsl.partial import Partial
class User(BaseModel):
name: str
email: str
class MeetingInfo(BaseModel):
users: Optional[Partial[User]]
date: str
location: str
budget: int
deadline: str
In this example, users
is explicitly marked as Optional
and wrapped with Partial[User]
, which should help mitigate the validation errors you're encountering during partial streaming.
This method leverages both the flexibility of Partial
for handling incomplete data and the explicit control over field optionality, providing a more robust solution for streaming scenarios. If further adjustments or clarifications are needed, feel free to reach out.
can you try doing it with
client = instructor.from_openai(OpenAI())
client.chat.completions.create_with_partial
Thanks for the quick reply, @jxnl (and thanks for this awesome library).
Hm, that seems to result in the same error. One point of confusion, did you mean create_partial
(so no _with_
)?
If so, both of these still throw a validation error:
....
class MeetingInfo(BaseModel):
users: User
date: str
location: str
budget: int
deadline: str
stream1 = client.chat.completions.create_partial(
model="gpt-4",
response_model=MeetingInfo,
messages=[
{
"role": "user",
"content": f"Get the information about the meeting and the users {text_block}",
},
],
stream=True,
) # type: ignore
stream2 = client.chat.completions.create_partial(
model="gpt-4",
response_model=instructor.Partial[MeetingInfo],
messages=[
{
"role": "user",
"content": f"Get the information about the meeting and the users {text_block}",
},
],
stream=True,
) # type: ignore
Yes sorry
I'll try to look at this but quite busy right now. I'd try to make everything optional for now.
from pydantic import BaseModel
from openai import OpenAI
import instructor
client = OpenAI()
client = instructor.from_openai(client)
class User(BaseModel):
name: str
email: str
class MeetingInfo(BaseModel):
user: User
date: str
location: str
budget: int
deadline: str
data = """
Jason Liu jason@gmail.com
Meeting Date: 2024-01-01
Meeting Location: 1234 Main St
Meeting Budget: $1000
Meeting Deadline: 2024-01-31
"""
stream1 = client.chat.completions.create_partial(
model="gpt-4",
response_model=MeetingInfo,
messages=[
{
"role": "user",
"content": f"Get the information about the meeting and the users {data}",
},
],
stream=True,
) # type: ignore
for message in stream1:
print(message)
"""
ser={} date=None location=None budget=None deadline=None
user={} date=None location=None budget=None deadline=None
user={} date=None location=None budget=None deadline=None
user={} date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name=None, email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email=None) date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date=None location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location=None budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=None deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=100 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline=None
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline='2024-01-31'
user=PartialUser(name='Jason Liu', email='jason@gmail.com') date='2024-01-01' location='1234 Main St' budget=1000 deadline='2024-01-31'
"""
this works.
@jxnl thanks, I think I know what's actually going on here (and it's mostly 'user error') -- seems like keeping the property name as users
combined with the prompt that has multiple users causes the LLM to only want to specify a list for the entry. Interestingly, your updated prompt with only a single user specified works even when the property is named users
(I guess the LLM can intuit that property is just named poorly).
This is consistent with the original error, which, re-reading, makes it clear enough that the model was trying to stuff a []
where it didn't belong: Input should be a valid dictionary or instance of PartialUser [type=model_type, input_value=[], input_type=list]
.
I independently ran into this issue on my own code, so I assume I must've done something similar (named something as a singular when expecting a list, or vice versa).
Some good LLM learning! Thanks for working with me to debug.
I hate llms, probably true lol
its almost too smart, but agree it feels like a funny face of user error?? i deleted the s without thinking
What Model are you using?
Describe the bug
The use of
Partial
does not seem to prevent validation errors on nested models.If you have the following:
And then attempt partial streaming (with
Parent
as your response model), a validation error will be thrown on thechild
property.To Reproduce
If you take the demo code offered in the Streaming Partial Responses documentation, and change the user property to be a single
User
instead of a list ofUsers
, for example:A validation error will be thrown on the
users
property:Expected behavior
No validation error on use of nested models (apologies if that's an incorrect assumption).
Screenshots N/A