yao62995 / A3C

Advantage async actor-critic Algorithms (A3C) and Progressive Neural Network implemented by tensorflow.
121 stars 42 forks source link

About the average reward #3

Open Tendimension opened 7 years ago

Tendimension commented 7 years ago

I have run 20 million frames(time-steps) in the Breakout environment, but the average reward has not changed. After about 17 million steps, the average reward has changed in Asynchronous Methods for Deep Reinforcement Learning. I do not know where the problem is?

kkjh0723 commented 7 years ago

@Tendimension Do you find any reason? I have the same problem. The avg. reward is 2.0 and std. is 0.0 until 20 million frames. Is the reward going up after some period?

Tendimension commented 7 years ago

@kkjh0723 I do not know what the reason is.

yao62995 commented 7 years ago

@Tendimension @kkjh0723 I also found this bug. I will check it soon.

Tendimension commented 7 years ago

@yao62995 Thanks a million!

kkjh0723 commented 7 years ago

@yao62995 Do you have any updates on this problem?

andyxzq commented 6 years ago

I find the same issue. The average reward is still 0.0 after 1 million steps.