thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.73k stars 1.11k forks source link

Use RNN in MARL #965

Open zhangwenjun1229 opened 10 months ago

zhangwenjun1229 commented 10 months ago

I found it may be caused by the input "state", which i have not defined. But I check the given test_drqn.py and i can't find how to use "state". Actually, I just want to stack obs in each step. I thought the default state is obs. Could you please give me some instructions on how to fix this error or achieve my goal? Thanks

Trinkle23897 commented 10 months ago

could you refer the drqn example? https://github.com/thu-ml/tianshou/blob/master/test/discrete/test_drqn.py

https://github.com/thu-ml/tianshou/blob/66b7fc542b496090e83d2df4a846fc02f3f3167b/test/discrete/test_drqn.py#L67-L69 This is the major change -- you need a different network

zhangwenjun1229 commented 10 months ago

Yes, I have refered the drqn example and I the Recurrent() net before. And then, I got the error that I mentioned before.

zhangwenjun1229 commented 10 months ago

I think it can be resulted from MARL. When I check the variables, I found that the Recurrent() have defined "hidden" and "cell" attributes for the first agent but not for the second agent. This error could happen when the model refer to the "hidden" attribute of the second agent. However, there is no such attribute of it (In fact, it' s Batch()). Then I check the code of this part. It seems only when the state is None, the algorithm will define these attributes. image

zhangwenjun1229 commented 10 months ago

Follow by this, I hack the code in the Recurrent() object in "state” part. In particular, I change line 313 from "if state is None:" to "if state is None or isinstance(state,Batch) and state.is_empty():". I 'm not very sure if this is the key reason for my error. But now it work!

lsylusiyao commented 10 months ago

I've got a similar problem, too. The stack_num makes my action_mask for MARL broken because I don't know which action_mask I should choose. For example, I use the DRQN and the debug info like this:

f544cecfb3e48bdc62968ab14c645a1