Closed terryzhao127 closed 4 years ago
Hi, @guikarist
How long did you train algorithms??
Today, I trained FQF for less than three hours at master branch on PongNoFrameskip-v4
.
( Orange: naive FQF, Blue: FQF with {multi_step: 3}. )
It seems that they are learning well.
Actually, my implementation is focused on the paper's reproducibility and doesn't use techniques like n-step returns and Double Q by default. So, if you want to train algorithms faster, I recommend you to change config as below. (I think multi_step is the most effective.)
multi_step: 3
double_q_learning: True
dueling_net: True
If you are interested in more efficient algorithm (rather than good final performance), I recommend you to check out policy-based algorithms like PPO.
Please let me know if you still have problems. Anyway, thank you for asking :)
Thanks for your reply!
Obviously it is the time which made me wrong. I rerun the FQF experiment last night. Through more than 7 hours I got this result:
The curve is just like the first part of yours. However, it's a bit too slow. How did you run 5M steps in 3 hours? Was there any parameters modified?
My test environment has 40 CPU cores, 500G memory and 8 Titan V GPU.
Hi, @guikarist
Let me assure you that GPU is enabled. Could you check it like below??
import torch
print(torch.cuda.is_available())
a = torch.zeros(4)
a = a.cuda()
print(a.device)
If it doesn't use GPUs, please check your CUDA setup. If you're using other than CUDA 10.2, maybe you need to reinstall PyTorch for the proper version of CUDA. Please see instructions for more details.
Could you report me the result??
BTW, you have really good resources... I'm kinda jealous lol
These resources are shared among our lab, not belonging to me LOL.
Thanks for your advice, now I got similar results!
BTW, you have really good resources... I'm kinda jealous lol
Me too, what a lucky dog!
I use the following command to run three algorithms on Pong respectively, but returns are always around -20 (by replacing
<algo>
withfqf
and so on).Is there anything wrong now at master branch (b4928f91d22c80eb7e42aa268da7f64de7491636)?