Agents don't learn - Githubissues

MicheleMusacchio commented 4 months ago

Even after a bigger run, agents don't learn: according to the pressurplate we have a reward in [-0.9,0] if the agent is in the same room of the assigned plate and reward [-1,...,-N] otherwise. I tried to implement the rendering but something wrong was happening with the pressureplate repo, but we can understand where the agent is based on the rewards and I saw that for a big run the agents are stuck in the first room. It might be for an incorrect implementation or at theory level since the actions are discrete and the gumbel softmax is not enough to face the problem. We should investigate better

jonasbarth commented 4 months ago

I tried running for 1000 episodes with updating the agents every 100 steps and a lower embedding dimension (256), but no good results.

Rewards

There is no upward trends in the rewards, just some spikes here and there.

Actor Loss

We always converge very quickly and then don't improve.

Critic Loss

Same as the actor loss, quick convergence and then no improvement.

Some questions I have:

MADDPG uses a centralised critic but decentralised actor, whereas our paper uses a centralised critic and centralised actor if I understood correctly

Policy Network: The policy network has a similar structure as the observation-action encoder, which uses an attention module over the entities of each type in the observation oi to adapt to the changing population during training. The only difference in this network is that the action is not included in the input. Notably, we do not share parameters between the Q function and the policy.

MicheleMusacchio commented 3 months ago

I got my first ALL DONES (i.e. last agent arrived to his plate)

cc: @jonasbarth

rl-2023 / rl-2023-final-project

Agents don't learn #2

Rewards

Actor Loss

Critic Loss