satoshi-kosugi / Unpaired-Image-Enhancement

53 stars 12 forks source link

Information regarding Reward value #6

Open trideeprath opened 3 years ago

trideeprath commented 3 years ago

Based on the paper, Reward is D(y') - MSE. It's confusing as the reward should be based on how good the Generator is able to fool the Discriminator i.e. how close the D(y) and D(y') are rather than the absolute value of D(y'). As the discriminator values are not scaled the value of D(y') can keep on increasing. Shouldn't the reward be something like 1/ ||D(y') - D(y)||

Can you elaborate on this point or provide some reference for this?

satoshi-kosugi commented 3 years ago

In A3C, the obtained reward is subtracted by the expected reward, as explained in Eq. (7) of the paper. So there is no problem if the absolute value of the reward is large.