Kaggle doesn’t support custom evaluation metrics and multiple challenge phases --- a common practice in popular challenges like COCO Caption Challenge, VQA etc.
CodaLab provides an open-source alternative to Kaggle and fixes several of their limitations but doesn’t support evaluating interactive agents in dynamic environments.
AICrowd something doesn’t support human-in-the loop evaluation of prediction based or code-upload based challenges.
ParlAI is not a challenge hosting platform and only supports evaluation of dialog models.
OpenAI gym is not a dedicated evaluation platform and lacks support for prediction based challenges, custom evaluation protocol, and human-in-the loop evaluation.
EvalAI Feature / difference:
Human-in-the-loop evaluation of machine learning models (by integrating with Amazon Mechanical Turk; AMT).
The ability to run user’s code in a dynamic environment instead of a static dataset enabling the evaluation of interactive agents.
EvalAI: Towards Better Evaluation of AI Agents
Kaggle doesn’t support custom evaluation metrics and multiple challenge phases --- a common practice in popular challenges like COCO Caption Challenge, VQA etc. CodaLab provides an open-source alternative to Kaggle and fixes several of their limitations but doesn’t support evaluating interactive agents in dynamic environments. AICrowd something doesn’t support human-in-the loop evaluation of prediction based or code-upload based challenges. ParlAI is not a challenge hosting platform and only supports evaluation of dialog models. OpenAI gym is not a dedicated evaluation platform and lacks support for prediction based challenges, custom evaluation protocol, and human-in-the loop evaluation.
EvalAI Feature / difference: