steveKapturowski / tensorflow-rl

Implementations of deep RL papers and random experimentation
Apache License 2.0
177 stars 47 forks source link

a question on the implementation of exploration bonus #12

Closed dhfromkorea closed 7 years ago

dhfromkorea commented 7 years ago

Hi, I have a hard time understanding the line 64 of intrinsic_motivation.py where the pseudocount is defined:

pseudocount = (1 - recoding_prob) / np.maximum(prob_ratio - 1, 1e-10)

According to the paper, shouldn't it be:

pseudocount = 1 / np.maximum(prob_ratio - 1, 1e-10)

or

pseudocount = prob * (1 - recording_prob) / (recoding_prob - prob)

Thank you so much for writing up the reference implementations of latest RL papers. They are purely awesome!

steveKapturowski commented 7 years ago

The first and last equations end up being equal with a bit of algebra: pseudocount = prob (1 - recording_prob) / (recoding_prob - prob) = (1 - recording_prob) / (recoding_prob*prob^-1 - 1) = (1 - recoding_prob) / np.maximum(prob_ratio - 1, 1e-10)

dhfromkorea commented 7 years ago

now I understand where it came from :-) thanks.

prob_ratio = np.exp(log_recoding_prob - log_prob) =recoding_prob / prob

steveKapturowski commented 7 years ago

No problem!