Open samulatitude opened 3 weeks ago
To do this, the schema must change.
Now, each evaluation will have 2 polymorphic relations: metadataType
and resultType
There will currently be 2 EvaluationMetadataTypes
:
LlmAsJudgeAdvanced
: The current metadata. It will contain the prompt
and configuration
json.LlmAsJudge
: This one will contain fields like objective
and additionalInstructions
.And 3 EvaluationResultConfigurations
, which will depend on a ResultableType
:
Boolean
: It contains fields like trueResultDescription
and falseResultDescription
Numerical
: It contains fields like minValue
, minValueDescription
, maxValue
and maxValueDescription
Text
: It contains fields like valueDescription
The evaluation will expect results depending on the resultType
, and will have different behaviour depending on its type
.
This allows for many more types of evaluations in the future, both llmAsJudge or any other type (like Human in the Loop), while maintaining the resultable types that we have now.
EvaluationResults
will still be the same, as it still fits the use case
EvaluationMetadataLlmAsJudgeAdvanced
In this first part, I'll focus on modifying and migrating to the new EvaluationMetadataLlmAsJudgeAdvanced
schema.
This type
does not require a resultConfiguration
yet, since it is defined in the configuration
json. I'll just move this json to the EvaluationMetadataLlmAsJudgeAdvanced
table for advanced usage.
Migration is deployed at a separate time from the code. As a result, we cannot expect the code to work before the migration or after it. To address this, this part is divided in 4 PRs:
configuration
column in evaluationMetadataLlmAsJudgeAdvanced
, and adapt the code to use configuration
from either (evaluation.configuration ?? evaluation.metadata.configuration)
.create
service to use the new schema.configuration
data from evaluations
to evaluationMetadataLlmAsJudgeAdvanced
.configuration
field from the evaluations
table.(evaluation.configuration ?? evaluation.metadata.configuration)
safenet to only use evaluation.metadata.configuration
from now on.EvaluationMetadataLlmAsJudge
and EvaluationConfiguration
tables.Here, I'll create the the EvaluationMetadataLlmAsJudge
table, and one table for each EvaluationConfiguration
result type. Also, modify the EvaluationDto
type and EvaluationRepository
to return the new type.
Deployment is split in five steps:
configuration
.metadata.configuration
.configuration
field in the metadata
table for the advanced evaluations.Here I'll create the services and UI to create the new types of evaluations, although they won't be used in production yet.
Finally, swap the options to create evaluations to the new simple types.
What?
Right now, there is a config when creating an evaluation that sets the result (numeric between 1 and 5), but we don't pass this to the prompt, so the user can set a range between 9 and 20, and it would be the one that takes into account.
In summary, there are 2 sources of truth.
https://www.figma.com/design/ODioXiqX8aeDMonsh0HBui/Latitude-Cloud?node-id=2738-34189&t=C31y3Hbykh3pzF2x-4