oplatek commented 2 weeks ago

What?

Introduces generation with structured outputs using OpenAI client for OpenAI API and VLLM

Defines JSON schema for ErrorSpanAnnotations as

class Annotation(BaseModel):
    text: str = Field(description="The text which is annotated.")
    error_type: int = Field(description="Index to the list of categories defined for the annotation campaign.")
    reason: str = Field(description="The reason for the annotation.")

class OutputAnnotations(BaseModel):
    annotations: list[Annotation] = Field(description="The list of annotations.")

Why?

Smaller models without structured outputs fails to follow the expected JSON format, now they comply 🎉
Pydantic is an excellent way how to define structured output 💪
Parsing is less heuristic 😉

Limitations

Notice that the Annotation has the attribute error_type instead of type as previously because for JSON schema the type is a reserved word. Just after parsing this PR creates the dictionary with key type instead of error_type so error_type is used for LLM calls and it's parsing.

Limited testing: I tested all three LLM-eval configs (OpenAI, Ollama, VLLM)

The wiki page does not cover vLLM installation and vLLM requires GPU.

kasnerz commented 1 week ago

@oplatek Thanks, looks useful!

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

And will you be able to write the wiki page for VLLM?

oplatek commented 1 week ago

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

Good point! Will fix it

And will you be able to write the wiki page for VLLM?

Yes, I will reference what I did for using VLLM on UFAL GPUs. I hope it will work for most people (I have very limited experience with vLLM but their documentation looks great)

ufal / factgenie

Structured output eval #152

What?

Why?

Limitations