Feat/single task aggretation

What?

Added functionality for producing aggregated plots for a single task in a given experiment. Here each plotting point for a given algorithm on a particular task will be the mean over all independent experiment runs and the error will be the 95% confidence interval.

Why?

To better match the guideline.

How?

Created plotting and aggregation functions.

Extra

The 95% confidence interval is implemented using the central limit theorem for computational efficiency, but bootstrapping can be used instead.

instadeepai / marl-eval

Feat/single task aggretation #17

What?

Why?

How?

Extra