Update evaluate benchmark cli

Blocked by https://github.com/neulab/explainaboard_client/pull/33

This updates the evaluate benchmark CLI to use the new API, simplifying the script a bit.

@pfliu-nlp : I'm making this a draft because there is no integration test testing the evaluate_benchmark CLI. I think it'd be a good idea to add one. Would you mind adding a simple one?

neulab / explainaboard_client

Update evaluate benchmark cli #34