Open jcwleo opened 6 years ago
https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/config.conf Is this config the last one to get similar results as on images above?
I see last pull request is about normalization, maybe UseNorm = True improve reward_per_epi or speed of convergence? And what about UseNoisyNet, when it could better to use?
@kslazarev Hi, I used that config. but only NumEnv is 128 and MaxStepPerEpisode is 4500. In paper, author did not announce Advantage Norm and Noisynet. so I disabled that config.
Result by config in master branch.
MontezumaRevengeNoFrameskip-v4
Right now, set NumEnv==128 and MaxStepPerEpisode==4500. I'll attach result when get 1200-2000 updates.
@jcwleo I see the difference in x-axis scale in reward_per_epi and reward_per_rollout plots. On your MontezumaRevengeNoFrameskip-v4 image they are 1.200k and 12.00k (10x scale). But on my temporary progress image they are 200 and 600 (3x scale). Maybe need to change additional option in config?
Or the x-axis scale (global_update and sample_episode) depends on player survival/experience so on later updates x-axis scale will be the same?
@kslazarev per_rollout and per_epi is not same scale. per_rollout means just one global update(enter agent.train_model()). but per_epi means Env’s one episode info that is one of parallel env. If one episode’s total step is 1024 and Num_step(rollout size) is 128, each scale of x-axis is 8 times different.
@jcwleo Yes, correct. I have another small questions about code. How could be appropriate to ask? Every question as new issue, or move forward to ask in this issue?
@kslazarev I want you to create an issue for each question. :)
NumEnv=128 and MaxStepPerEpisode==4500
Looks similar as in README. On NumEnv=128 I've stopped the process because swap is used.
Hello, can you tell me how many Gpus you used and how long it took you to see this effect?
Hello. Not fast. Don't remember exactly, 1 or 2 NV 1080 Ti
@kslazarev Excuse me, I use 1 3090,2 envs, run for more than 2 hours, the reward is still 0, is this normal? I didn't load the pre-training model
It was 3 years ago. Could not help, I don't remember exactly what problem could cause.
@kslazarev Ok, thanks
@kslazarev Thank you for answering for me.