rl-2023 / rl-2023-final-project

0 stars 0 forks source link

Agents don't learn #2

Open MicheleMusacchio opened 4 months ago

MicheleMusacchio commented 4 months ago

Even after a bigger run, agents don't learn: according to the pressurplate we have a reward in [-0.9,0] if the agent is in the same room of the assigned plate and reward [-1,...,-N] otherwise. I tried to implement the rendering but something wrong was happening with the pressureplate repo, but we can understand where the agent is based on the rewards and I saw that for a big run the agents are stuck in the first room. It might be for an incorrect implementation or at theory level since the actions are discrete and the gumbel softmax is not enough to face the problem. We should investigate better

jonasbarth commented 4 months ago

I tried running for 1000 episodes with updating the agents every 100 steps and a lower embedding dimension (256), but no good results.

Rewards

There is no upward trends in the rewards, just some spikes here and there. image

Actor Loss

We always converge very quickly and then don't improve. image

Critic Loss

Same as the actor loss, quick convergence and then no improvement. image

Some questions I have:

MicheleMusacchio commented 3 months ago

I got my first ALL DONES (i.e. last agent arrived to his plate)

image

cc: @jonasbarth