Open FaisalAhmed0 opened 2 years ago
I had exactly the same question. Checking the original APT paper, the text seems to indicate they only take s' prime as the particle, whereas the equation indicates that s is taken as the particle. But agreed that the CIC paper indicates that tau (the projection of (s, s')) should be taken.
I tried both setups, and the performance with (obs, next_obs) is slightly better than (next_obs, next_obs).
@FaisalAhmed0 thanks for the empirical study. Have you tried passing in (z, z), as written in the CIC paper, by passing in source
and target
into pred_net
to compute compute_apt_reward(z, z, args)
?
Also, related to the reward question, what the purpose of: https://github.com/rll-research/cic/blob/b523c3884256346cb585bf06e52a7aadc127dcfc/agent/cic.py#L186
What it should compute and for what reason?
I don't agree with that. They just wanted to estimate the novelty of the next states with knn. The compute_apt_message function does not compute what you say(tau novelty). If you want to do so, the simplest way is to concatenate the two.
A follow-up question to line 199 and line 200, shouldn't the source be using the state_net and the target be using the next_state_net?
Why we pass (next_obs, next_obs)? It should (obs, next_obs) right? Because you are optimizing for the entropy of $\tau=(s, s^{'})$ https://github.com/rll-research/cic/blob/b523c3884256346cb585bf06e52a7aadc127dcfc/agent/cic.py#L224