Evaluate system API call

Blocked by https://github.com/neulab/explainaboard_client/pull/32 Makes progress towards https://github.com/neulab/explainaboard_client/issues/31

Adds the ability to submit a system from a list of dictionaries stored in memory. API is like this:

with open(self._SYSTEM_OUTPUT, "r") as fin:
    system_output = [{"predicted_label": x.strip()} for x in fin]
result: dict = self._client.evaluate_system(
    system_output=system_output,
    task="text-classification",
    system_name="test_cli",
    metric_names=["Accuracy"],
    source_language="en",
    target_language="en",
    dataset="sst2",
    split="test",
)

neulab / explainaboard_client

Evaluate system API call #33