Open batmanzmc opened 7 years ago
Hi @batmanzmc,
Your agent will be evaluated only against the challenge agent.
Hope it helps, Morgan
@mfuntowicz ,thanks!
Hi @mfuntowicz,
But isn't the challenge to "train collaborative AI solutions that learn to achieve high scores across a range of partners"?
I mean, only playing against the challenge agent has the same complexity as any simple atari game.
Thank you for your support :)
Dominik
PS: You should update the callenge paper, if the evaluation is really only done by playing against the challenge agent ;)
Hi @domin1101
You're right, the challenge aims to provide an agent able to adapt itself to various behaviours.
The PigChaseChallengeAgent shipped with the challenge source code aims to this, as it randomly selects its behaviour between a Random agent (taking actions uniformly sampled from the set of actions available) and the Focused agent which uses an A* algorithm to move asap to the pig.
Ideally, your agent would to be able to:
1) Identify the collaborative behaviour (or not) of the agent it's playing with 2) Adapts its own behaviour according to agent it's playing with (ie 1.)
The PigChase agent helmet changes according to the way it behaves
Hope it clarify a little bit regarding your question :)
Morgan
@mfuntowicz Hi, in your evaluation.py, you compute the mean score of each step rather than each episode? May I know why?
Hi @Haishion,
We prefer average per-step reward because it reflects agent’s efficiency of obtaining rewards in tasks where rewards can be zero. In this particular task, rewards are non-zero and we expect average per-step and per-episode rewards to correlate.
If it is helpful as a secondary metric / diagnostic, we’d be happy to add per-episode reward to the script.
@mfuntowicz As far as I can see, the efficiency of the agent can be better evaluated by per-episode reward. For example, if one agent exits with reward 4 and another catch the pig with reward 15, the per-step reward will be much higher for the first one, but this is not collaborative. I guess most of participants will understand your principle in such way.
Please clarify, thanks!