Support Benchmark CRUD operations

Added the following benchmark CRUD functions in the client:
- get_benchmark()
- upload_benchmark()
- update_benchmark()
- delete_benchmark()
Added tests to all functions.
Modified config_gaokao.json under example/benchmark/gaokao and used it as a test artifact. This kind of defeats the purpose of having a tests/artifacts folder, but I think it improves consistency as it forces us to update the example if the schema is changed in the future.

Documentation on the expected benchmark JSON schema.
Support the same set of functions in the CLI.
Update evaluate_benchmark.py in the CLI and README under the example/benchmark/gaokao on submitting a system to a benchmark.

neulab / explainaboard_client