Simplify defining span annotations categories

Current state

For LLM evaluation user specifies LLMMetric config in the yaml config and possibly implements/extends existing LLM metric and changes the LLMMetric factory in factgenie/evaluate.py. Note all the metrics according to llm-eval/*yaml configs are loaded and offered. So most of the LLM annotation campaign is defined in code. With the single exception of the error_categories which one must insert via web browser dialog and they must match the categories specified in the yaml config for the LLM prompt.

Proposal

[ ] Rename error_categories to annotation_span_categories
[ ] Allow specifying the annotation_span_categories in the yaml metric configs in llm-eval/your_metric.yaml
[ ] Allow loading creating a human evaluation based on existing llm-eval campaign. Load the same annotation types from there.

kasnerz / factgenie

Simplify defining span annotations categories #26

Current state

Proposal