zchen0420 / nn_papers

To record my paper reading in my native language, mimicking ooooohira-san.
0 stars 0 forks source link

Competition Platforms #13

Open zchen0420 opened 3 months ago

zchen0420 commented 3 months ago

EvalAI: Towards Better Evaluation of AI Agents

Kaggle doesn’t support custom evaluation metrics and multiple challenge phases --- a common practice in popular challenges like COCO Caption Challenge, VQA etc. CodaLab provides an open-source alternative to Kaggle and fixes several of their limitations but doesn’t support evaluating interactive agents in dynamic environments. AICrowd something doesn’t support human-in-the loop evaluation of prediction based or code-upload based challenges. ParlAI is not a challenge hosting platform and only supports evaluation of dialog models. OpenAI gym is not a dedicated evaluation platform and lacks support for prediction based challenges, custom evaluation protocol, and human-in-the loop evaluation.

EvalAI Feature / difference:

  1. Human-in-the-loop evaluation of machine learning models (by integrating with Amazon Mechanical Turk; AMT).
  2. The ability to run user’s code in a dynamic environment instead of a static dataset enabling the evaluation of interactive agents.