Open xf-zhao opened 2 years ago
Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.
Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.
@pickxiguapi Hi, sorry, not solved. I think this is a mistake the author has not noticed since the work is still somehow in the progress / unfinished totally.
Why should g1 and g2 be used after updating once? I think there is no reason to call it from anywhere else before finetuning.
I would like to ask a simple question. During pre-training, it was found that the neg
in compute.cpc_loss
is approximately 1200, while the pos
is around 6. Is this a normal phenomenon?
Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.
However, I found that in this implementation, the state encoder and skill encoder in
cic
module ($g_{\psi1}$ and $g{\psi_2}$ in the paper) are never used before being fed into policy neural networks. Incic/agent/cic.py
line 222, parameters incic
is updated once but not called for encodingobs
andskill
thereafter.Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word, $\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z".
I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.