jcwleo / random-network-distillation-pytorch

Random Network Distillation pytorch
MIT License
242 stars 43 forks source link

README asset #5

Open jcwleo opened 6 years ago

jcwleo commented 6 years ago

image

jcwleo commented 6 years ago
2018-11-20 11 13 35
jcwleo commented 6 years ago

image

jcwleo commented 5 years ago

image

kslazarev commented 5 years ago

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/config.conf Is this config the last one to get similar results as on images above?

I see last pull request is about normalization, maybe UseNorm = True improve reward_per_epi or speed of convergence? And what about UseNoisyNet, when it could better to use?

jcwleo commented 5 years ago

@kslazarev Hi, I used that config. but only NumEnv is 128 and MaxStepPerEpisode is 4500. In paper, author did not announce Advantage Norm and Noisynet. so I disabled that config.

kslazarev commented 5 years ago

Result by config in master branch.

MontezumaRevengeNoFrameskip-v4

2019-01-11 15 05 01 gmt 03 00

Right now, set NumEnv==128 and MaxStepPerEpisode==4500. I'll attach result when get 1200-2000 updates.

kslazarev commented 5 years ago

@jcwleo I see the difference in x-axis scale in reward_per_epi and reward_per_rollout plots. On your MontezumaRevengeNoFrameskip-v4 image they are 1.200k and 12.00k (10x scale). But on my temporary progress image they are 200 and 600 (3x scale). Maybe need to change additional option in config?

2019-01-11 21 58 49 gmt 03 00
kslazarev commented 5 years ago

Or the x-axis scale (global_update and sample_episode) depends on player survival/experience so on later updates x-axis scale will be the same?

jcwleo commented 5 years ago

@kslazarev per_rollout and per_epi is not same scale. per_rollout means just one global update(enter agent.train_model()). but per_epi means Env’s one episode info that is one of parallel env. If one episode’s total step is 1024 and Num_step(rollout size) is 128, each scale of x-axis is 8 times different.

kslazarev commented 5 years ago

@jcwleo Yes, correct. I have another small questions about code. How could be appropriate to ask? Every question as new issue, or move forward to ask in this issue?

jcwleo commented 5 years ago

@kslazarev I want you to create an issue for each question. :)

kslazarev commented 5 years ago

NumEnv=128 and MaxStepPerEpisode==4500

2019-01-13 7 40 16 gmt 03 00

Looks similar as in README. On NumEnv=128 I've stopped the process because swap is used.

xiaioding commented 1 year ago

Hello, can you tell me how many Gpus you used and how long it took you to see this effect?

kslazarev commented 1 year ago

Hello. Not fast. Don't remember exactly, 1 or 2 NV 1080 Ti

xiaioding commented 1 year ago

@kslazarev Excuse me, I use 1 3090,2 envs, run for more than 2 hours, the reward is still 0, is this normal? I didn't load the pre-training model

kslazarev commented 1 year ago

It was 3 years ago. Could not help, I don't remember exactly what problem could cause.

xiaioding commented 1 year ago

@kslazarev Ok, thanks

jcwleo commented 1 year ago

@kslazarev Thank you for answering for me.