Closed coolmian closed 1 week ago
my case:
class Narrative(BaseModel):
type: Literal["dialogue", "narration", "voiceover"] = Field()
content: str = Field(default=None)
name: str | None = Field(default=None)
reaction: str | None = Field(default=None)
class StoryToJSON(dspy.Signature):
"""
Convert story text into structured JSON format with specific fields for narration, dialogue, and voiceover.
Make the performance more like a script or animation script style, help the performer better understand the character's emotions and reactions, and make the content more expressive and situational.
NOTE: Convert each paragraph based on the story_text without skipping or omitting any content.
"""
story_text = dspy.InputField()
json_output: list[Narrative] = dspy.OutputField(desc="list of narratives")
# Define the predictor.
predictor = dspy.Predict(StoryToJSON)
example = dspy.Example(
story_text = "小明走出家门,跟邻居打招呼:“你好呀”。邻居微笑朝他点头,内心奇怪这小子今天怎么对他这么有礼貌?",
json_output = [
{"type": "narration", "content": "小明走出家门,跟邻居打招呼"},
{"type": "dialogue", "name": "小明", "reaction": "高兴", "content": "你好呀"},
{"type": "narration", "content": "邻居微笑朝他点头"},
{"type": "voiceover", "name":"邻居", "reaction": "内心奇怪", "content": "这小子今天怎么对我这么有礼貌"}
]
)
predictor.demos = [example]
with open("dataset/1.txt", "r") as f:
story_text = f.read()
# Call the predictor on a particular input.
pred = predictor(story_text=story_text)
print(f"Question: {story_text}")
for item in pred.json_output:
print(item.model_dump())
If examples containing Chinese strings are converted to Unicode encoding, the LLM tends to reply with Unicode encoded strings, resulting in a decrease in reply quality and additional decoding work
Thanks a lot @coolmian !
Using ensure_ascii=False provides better support for Chinese characters directly
before:
after: