Error while training - Githubissues

MicheleMusacchio commented 7 months ago

I got this error that occurred while running the training part. I find it a particular one because was the first and only time that I saw it, so is about some specific configuration of the agents. Might be something about attention and agents visibility?

Traceback (most recent call last):
  File "C:\Users\Michele\Desktop\Università\2nd year\RL\Final Project\rl-2023-final-project\train.py", line 114, in <module>
    loss = compute_loss(experiences, agents[i]['q_function'], [{'policy_network':agents[j]['policy_network']} for j in range(num_agents)], gamma)
  File "C:\Users\Michele\Desktop\Università\2nd year\RL\Final Project\rl-2023-final-project\train.py", line 33, in compute_loss
    current_q_values = q_function(observation_stack, actions) #.squeeze()
  File "C:\Users\Michele\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Michele\Desktop\Università\2nd year\RL\Final Project\rl-2023-final-project\maddpg.py", line 76, in forward
    oa_embedding = self.observation_action_encoder.forward(agent, observation, action)
  File "C:\Users\Michele\Desktop\Università\2nd year\RL\Final Project\rl-2023-final-project\encoder.py", line 265, in forward
    visible_observations = get_visible_agent_observations(observations=observation, agent=agent,
  File "C:\Users\Michele\Desktop\Università\2nd year\RL\Final Project\rl-2023-final-project\observation.py", line 98, in get_visible_agent_observations
    close_observations = observations[close_observations].reshape(batch_size, -1, dim)
RuntimeError: shape '[32, -1, 102]' is invalid for input of size 9690

MicheleMusacchio commented 6 months ago

Got again the error and as you can see by the rewards the agent managed to reach the second room so is pretty probable that is a visibility issue @jonasbarth

jonasbarth commented 6 months ago

Yes definitely seems so, we could either

just drop the visibility thing and always use all other agent observations because the visibility thing is not really a feature of MADDPG but of the pressureplate environment
modify the code to deal with the situation where the agent has no other visible agents.

The first one is probably easier to implement and makes learning easier as well. Feel free to modify stuff! But great to see that they were able to reach the second room :grin:

jonasbarth commented 6 months ago

I changed it such that always all agents are visible now, which makes a lot more sense I think. The sensor range of each agent just determines what it sees in its observation not which other observations it can see.

https://github.com/rl-2023/rl-2023-final-project/commit/e6a97a557277789f9ecd2a835ef54847b1798067

rl-2023 / rl-2023-final-project

Error while training #1