The plan

To do this, the schema must change.

Now, each evaluation will have 2 polymorphic relations: metadataType and resultType

There will currently be 2 EvaluationMetadataTypes:

LlmAsJudgeAdvanced: The current metadata. It will contain the prompt and configuration json.
LlmAsJudge: This one will contain fields like objective and additionalInstructions.

And 3 EvaluationResultConfigurations, which will depend on a ResultableType:

Boolean: It contains fields like trueResultDescription and falseResultDescription
Numerical: It contains fields like minValue, minValueDescription, maxValue and maxValueDescription
Text: It contains fields like valueDescription

The evaluation will expect results depending on the resultType, and will have different behaviour depending on its type.

This allows for many more types of evaluations in the future, both llmAsJudge or any other type (like Human in the Loop), while maintaining the resultable types that we have now.

EvaluationResults will still be the same, as it still fits the use case

Development breakdown

Part 1 — `EvaluationMetadataLlmAsJudgeAdvanced`

In this first part, I'll focus on modifying and migrating to the new EvaluationMetadataLlmAsJudgeAdvanced schema.

This type does not require a resultConfiguration yet, since it is defined in the configuration json. I'll just move this json to the EvaluationMetadataLlmAsJudgeAdvanced table for advanced usage.

Migration is deployed at a separate time from the code. As a result, we cannot expect the code to work before the migration or after it. To address this, this part is divided in 4 PRs:

1. 489 — Create the configuration column in evaluationMetadataLlmAsJudgeAdvanced, and adapt the code to use configuration from either (evaluation.configuration ?? evaluation.metadata.configuration).
1. 513 — Modify the evaluation's create service to use the new schema.
1. 515 — Perform the migration to move all configuration data from evaluations to evaluationMetadataLlmAsJudgeAdvanced.
1. 518 — Remove the configuration field from the evaluations table.
1. 519 — Remove the (evaluation.configuration ?? evaluation.metadata.configuration) safenet to only use evaluation.metadata.configuration from now on.

Part 2 — `EvaluationMetadataLlmAsJudge` and `EvaluationConfiguration` tables.

Here, I'll create the the EvaluationMetadataLlmAsJudge table, and one table for each EvaluationConfiguration result type. Also, modify the EvaluationDto type and EvaluationRepository to return the new type.

Deployment is split in five steps:

1. 520 — Create the tables and types. However, EvaluationDto and EvaluationRepository are still the same
1. 525 — Update the EvaluationDto type and EvaluationRepository to fetch data from all polymorphic relations. Additinoally, creating new evaluations (still the advanced type) will create the configuration both in its metadata table (legacy way) and its own configuration table.
1. 529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata configuration.
1. 533 — Use the configuration table instead of metadata.configuration.
1. Remove configuration field in the metadata table for the advanced evaluations.

Part 3 — New UI

Here I'll create the services and UI to create the new types of evaluations, although they won't be used in production yet.

Part 4 — Migration

Finally, swap the options to create evaluations to the new simple types.

latitude-dev / latitude-llm

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

The plan

Development breakdown

Part 1 — `EvaluationMetadataLlmAsJudgeAdvanced`

489 — Create the `configuration` column in `evaluationMetadataLlmAsJudgeAdvanced`, and adapt the code to use `configuration` from either `(evaluation.configuration ?? evaluation.metadata.configuration)`.

513 — Modify the evaluation's `create` service to use the new schema.

515 — Perform the migration to move all `configuration` data from `evaluations` to `evaluationMetadataLlmAsJudgeAdvanced`.

518 — Remove the `configuration` field from the `evaluations` table.

519 — Remove the `(evaluation.configuration ?? evaluation.metadata.configuration)` safenet to only use `evaluation.metadata.configuration` from now on.

Part 2 — `EvaluationMetadataLlmAsJudge` and `EvaluationConfiguration` tables.

520 — Create the tables and types. However, EvaluationDto and EvaluationRepository are still the same

525 — Update the EvaluationDto type and EvaluationRepository to fetch data from all polymorphic relations. Additinoally, creating new evaluations (still the advanced type) will create the configuration both in its metadata table (legacy way) and its own configuration table.

529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata `configuration`.

533 — Use the configuration table instead of `metadata.configuration`.

Part 3 — New UI

Part 4 — Migration

latitude-dev / latitude-llm

Evaluations – Remove conflicts in evaluation objective between configuration and prompt #420

The plan

Development breakdown

Part 1 — EvaluationMetadataLlmAsJudgeAdvanced

489 — Create the configuration column in evaluationMetadataLlmAsJudgeAdvanced, and adapt the code to use configuration from either (evaluation.configuration ?? evaluation.metadata.configuration).

513 — Modify the evaluation's create service to use the new schema.

515 — Perform the migration to move all configuration data from evaluations to evaluationMetadataLlmAsJudgeAdvanced.

518 — Remove the configuration field from the evaluations table.

519 — Remove the (evaluation.configuration ?? evaluation.metadata.configuration) safenet to only use evaluation.metadata.configuration from now on.

Part 2 — EvaluationMetadataLlmAsJudge and EvaluationConfiguration tables.

520 — Create the tables and types. However, EvaluationDto and EvaluationRepository are still the same

525 — Update the EvaluationDto type and EvaluationRepository to fetch data from all polymorphic relations. Additinoally, creating new evaluations (still the advanced type) will create the configuration both in its metadata table (legacy way) and its own configuration table.

529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata configuration.

533 — Use the configuration table instead of metadata.configuration.

Part 3 — New UI

Part 4 — Migration

Part 1 — `EvaluationMetadataLlmAsJudgeAdvanced`

489 — Create the `configuration` column in `evaluationMetadataLlmAsJudgeAdvanced`, and adapt the code to use `configuration` from either `(evaluation.configuration ?? evaluation.metadata.configuration)`.

513 — Modify the evaluation's `create` service to use the new schema.

515 — Perform the migration to move all `configuration` data from `evaluations` to `evaluationMetadataLlmAsJudgeAdvanced`.

518 — Remove the `configuration` field from the `evaluations` table.

519 — Remove the `(evaluation.configuration ?? evaluation.metadata.configuration)` safenet to only use `evaluation.metadata.configuration` from now on.

Part 2 — `EvaluationMetadataLlmAsJudge` and `EvaluationConfiguration` tables.

529 — Migrate older evaluations to use new configuration table, while still maintaining the metadata `configuration`.

533 — Use the configuration table instead of `metadata.configuration`.