openai / random-network-distillation

Code for the paper "Exploration by Random Network Distillation"
https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/
881 stars 160 forks source link

Wrong PPO Model architecture. #26

Open alirezakazemipour opened 4 years ago

alirezakazemipour commented 4 years ago

According to the DQN nature paper and PPO1 implementation, this line:

X = activ(conv(X, 'c3', nf=64, rf=4, stride=1, init_scale=np.sqrt(2), data_format=data_format))

should be changed to:

X = activ(conv(X, 'c3', nf=64, rf=3, stride=1, init_scale=np.sqrt(2), data_format=data_format))

In short, kernel size is wrong!

xiaioding commented 1 year ago

这两行有什么区别?

alirezakazemipour commented 1 year ago

@xiaioding The difference is in kernel sizes (rf.)