Implement a new endpoint POST /evaluation/{skill-id}/{dataset-id}. It should trigger the evaluation of the specified skill on the specified dataset.
first step: synchronous request use the deployed skill and model, but send only ±8 examples
The results of the evaluation (predictions) should be saved to the database (or maybe in separate task and only log to console here?).
(Note: It would probably make sense to not compute the metric directly, but instead implement the metric-calculation based on the predictions saved to DB => additional task for metric computation).
Implement a new endpoint
POST /evaluation/{skill-id}/{dataset-id}
. It should trigger the evaluation of the specified skill on the specified dataset. first step: synchronous request use the deployed skill and model, but send only ±8 examplesThe results of the evaluation (predictions) should be saved to the database (or maybe in separate task and only log to console here?).
(Note: It would probably make sense to not compute the metric directly, but instead implement the metric-calculation based on the predictions saved to DB => additional task for metric computation).