nclskfm / square-core

SQuARE: Software for question answering research.
https://square.ukp-lab.de
2 stars 0 forks source link

Trigger evaluation of skills #11

Closed nclskfm closed 1 year ago

nclskfm commented 1 year ago

Implement a new endpoint POST /evaluation/{skill-id}/{dataset-id}. It should trigger the evaluation of the specified skill on the specified dataset. first step: synchronous request use the deployed skill and model, but send only ±8 examples

The results of the evaluation (predictions) should be saved to the database (or maybe in separate task and only log to console here?).

(Note: It would probably make sense to not compute the metric directly, but instead implement the metric-calculation based on the predictions saved to DB => additional task for metric computation).