microsoft / malmo-challenge

Task and example code for the Malmo Collaborative AI Challenge
MIT License
152 stars 96 forks source link

Our agent will be evaluated by playing with all other teams' agents, or just the challenge agent. #35

Open batmanzmc opened 7 years ago

batmanzmc commented 7 years ago

Please clarify, thanks!

mfuntowicz commented 7 years ago

Hi @batmanzmc,

Your agent will be evaluated only against the challenge agent.

Hope it helps, Morgan

batmanzmc commented 7 years ago

@mfuntowicz ,thanks!

cornerfarmer commented 7 years ago

Hi @mfuntowicz,

But isn't the challenge to "train collaborative AI solutions that learn to achieve high scores across a range of partners"?

I mean, only playing against the challenge agent has the same complexity as any simple atari game.

Thank you for your support :)

Dominik

PS: You should update the callenge paper, if the evaluation is really only done by playing against the challenge agent ;)

mfuntowicz commented 7 years ago

Hi @domin1101

You're right, the challenge aims to provide an agent able to adapt itself to various behaviours.

The PigChaseChallengeAgent shipped with the challenge source code aims to this, as it randomly selects its behaviour between a Random agent (taking actions uniformly sampled from the set of actions available) and the Focused agent which uses an A* algorithm to move asap to the pig.

Ideally, your agent would to be able to:

1) Identify the collaborative behaviour (or not) of the agent it's playing with 2) Adapts its own behaviour according to agent it's playing with (ie 1.)

The PigChase agent helmet changes according to the way it behaves

Hope it clarify a little bit regarding your question :)

Morgan

Haishion commented 7 years ago

@mfuntowicz Hi, in your evaluation.py, you compute the mean score of each step rather than each episode? May I know why?

mfuntowicz commented 7 years ago

Hi @Haishion,

We prefer average per-step reward because it reflects agent’s efficiency of obtaining rewards in tasks where rewards can be zero. In this particular task, rewards are non-zero and we expect average per-step and per-episode rewards to correlate.

If it is helpful as a secondary metric / diagnostic, we’d be happy to add per-episode reward to the script.

Haishion commented 7 years ago

@mfuntowicz As far as I can see, the efficiency of the agent can be better evaluated by per-episode reward. For example, if one agent exits with reward 4 and another catch the pig with reward 15, the per-step reward will be much higher for the first one, but this is not collaborative. I guess most of participants will understand your principle in such way.