xuehy / pytorch-maddpg

A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)
610 stars 121 forks source link

Train result #4

Closed xuemei-ye closed 5 years ago

xuemei-ye commented 6 years ago

Hello,I use your program and run 2000 episodes,but compare with your reslut , my reward didn't have the same obvious effect. The reward variation tendency didn't rise. I don't change the code, just set max_steps = 100. I don't konw why , did I missing somthing ? ? I run the program on virtual machine , and didn't use GPU , 2000 epiode use approximately 50 hours, it's too slowly, To get the result how much time you spend on training process ? ?

2000-episode_diff

xuehy commented 6 years ago

Sorry for the late reply!

  1. The parameters are important for the algorithm. If max_steps=100, the task is much more difficult compared with large max_steps. You may need to tune many other parameters. I suggest that you first test it with large max_steps.

  2. Without GPU, it is very slow to train. I only trained with GPU (Titan X) and it takes less than a day for more than 3000 episodes. If you do not have GPUs and cannot bear the long training time, you may be interested in another algorithm MACE which does not utilize GPUs and is also effective. Under similar environments, MACE only takes about 5 hours to converge.

xuemei-ye commented 6 years ago

Thanks for your reply. These days I transplant your code on the environment which openAI support(https://github.com/openai/multiagent-particle-envs),and seems it's didn't work either.I will change max_steps larger,and your algorithm implementation is great job! 6000_episdoe_step 50

xuemei-ye commented 6 years ago

Hello,I didn‘t change parameters and run pytorch/maddpg on GPU,1000 episodes is fine,but then reward goes down,It is very strange,Do you know why ? Why I can't get the same result with you ? ? screenshot from 2017-11-23 10-57-48

xuehy commented 6 years ago

@xmye The training of deep reinforcement learning can be unstable sometimes. There are many possibilities that can cause this phenomena. You may suffer from overfitting. A way to circumvent is to apply early-stopping or just save the model periodicly and pick the one with the best reward.

djbitbyte commented 6 years ago

@xuehy Hi, I have trained on your implementation, not changing any parameter setting. But I got the same training result as @xuemei-ye for several times, the reward collapses every time around 1000 episodes. And I have saved the model periodically, the best scene I got was that the two agents trying to stick together, but hiding in the corner or lingering at edges, not actively run around to capture food as in the scene in your README. How did you get the training result as in your README, regarding parameter setting or anything else that could influence the training result? newplot 7 newplot 6

xuehy commented 6 years ago

@djbitbyte sorry for the late reply since I was on a vacation. I have never met similar phenomena. I will try to run the code recently and to see if anything related has been changed.

djbitbyte commented 6 years ago

@xuehy I think that it might be caused by the random seed in main.py. I tried to remove them and ran the training again, it still collapsed but at different episode. np.random.seed(1234) th.manual_seed(1234) world.seed(1234)

and now I decreased learning rate from 0.001 to 0.0001 for cirtic_optimizer, and 0.0001 to 0.00001 for actor_optimizer. The training is still on going, not collapsed so far. The total reward reached around 600 at episode around 1200, but then it decreased gradually. Not so good as your training result.

Did you use the exactly same parameter setting as in the code to get the training result? And is it stable for the training process, did you run into the collapse frequently? And got the result as shown in the readme after many trainings. newplot