Trigger evaluation of skills

Implement a new endpoint POST /evaluation/{skill-id}/{dataset-id}. It should trigger the evaluation of the specified skill on the specified dataset. first step: synchronous request use the deployed skill and model, but send only ±8 examples

The results of the evaluation (predictions) should be saved to the database (or maybe in separate task and only log to console here?).

(Note: It would probably make sense to not compute the metric directly, but instead implement the metric-calculation based on the predictions saved to DB => additional task for metric computation).

nclskfm / square-core

Trigger evaluation of skills #11