APT reward - Githubissues

rll-research / cic

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

78 stars 18 forks source link

APT reward #4

Open FaisalAhmed0 opened 2 years ago

FaisalAhmed0 commented 2 years ago

Why we pass (next_obs, next_obs)? It should (obs, next_obs) right? Because you are optimizing for the entropy of $\tau=(s, s^{'})$ https://github.com/rll-research/cic/blob/b523c3884256346cb585bf06e52a7aadc127dcfc/agent/cic.py#L224

Kaixhin commented 2 years ago

I had exactly the same question. Checking the original APT paper, the text seems to indicate they only take s' prime as the particle, whereas the equation indicates that s is taken as the particle. But agreed that the CIC paper indicates that tau (the projection of (s, s')) should be taken.

Screenshot from 2022-05-20 18-06-01

FaisalAhmed0 commented 2 years ago

I tried both setups, and the performance with (obs, next_obs) is slightly better than (next_obs, next_obs).

Kaixhin commented 2 years ago

@FaisalAhmed0 thanks for the empirical study. Have you tried passing in (z, z), as written in the CIC paper, by passing in source and target into pred_net to compute compute_apt_reward(z, z, args)?

https://github.com/rll-research/cic/blob/b523c3884256346cb585bf06e52a7aadc127dcfc/agent/cic.py#L199-L201

Howuhh commented 2 years ago

Also, related to the reward question, what the purpose of: https://github.com/rll-research/cic/blob/b523c3884256346cb585bf06e52a7aadc127dcfc/agent/cic.py#L186

What it should compute and for what reason?

seolhokim commented 2 years ago

I don't agree with that. They just wanted to estimate the novelty of the next states with knn. The compute_apt_message function does not compute what you say(tau novelty). If you want to do so, the simplest way is to concatenate the two.

Sou0602 commented 1 year ago

A follow-up question to line 199 and line 200, shouldn't the source be using the state_net and the target be using the next_state_net?