Closed dhfromkorea closed 7 years ago
The first and last equations end up being equal with a bit of algebra: pseudocount = prob (1 - recording_prob) / (recoding_prob - prob) = (1 - recording_prob) / (recoding_prob*prob^-1 - 1) = (1 - recoding_prob) / np.maximum(prob_ratio - 1, 1e-10)
now I understand where it came from :-) thanks.
prob_ratio = np.exp(log_recoding_prob - log_prob) =recoding_prob / prob
No problem!
Hi, I have a hard time understanding the line 64 of intrinsic_motivation.py where the pseudocount is defined:
pseudocount = (1 - recoding_prob) / np.maximum(prob_ratio - 1, 1e-10)
According to the paper, shouldn't it be:
pseudocount = 1 / np.maximum(prob_ratio - 1, 1e-10)
or
pseudocount = prob * (1 - recording_prob) / (recoding_prob - prob)
Thank you so much for writing up the reference implementations of latest RL papers. They are purely awesome!