tkipf / c-swm

Contrastive Learning of Structured World Models
MIT License
386 stars 67 forks source link

Implementation of hinge loss does not match the paper #7

Open Yingdong-Hu opened 3 years ago

Yingdong-Hu commented 3 years ago

Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.

According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep t and state at timestep t+1.

However, in the code below, state and neg_state are both at timestep t.

self.neg_loss = torch.max(
    zeros, self.hinge - self.energy(
        state, action, neg_state, no_trans=True)).mean()

I noticed that the same question was also asked here.

I want to know if this is a bug ? Does the discrepancy affect the final performance ?

tkipf commented 3 years ago

Thanks for your comment. This is indeed a (small) difference and it should not affect performance in any significant way. I expect that one could show that both formulations are equivalent for a system where the probability of visiting any state in the MDP is the same (in expectation) -- which should be the case for the randomly initialized block pushing tasks and the physics simulation). For the Atari games this assumption is violated however and there could be (minor) differences in performance between either of the two implementations, but I would expect them to be smaller than the variance you get from the parameter initialization. If you happen to test this experimentally, please share your results here :)

On Sun, Feb 7, 2021 at 11:17 AM Alex notifications@github.com wrote:

Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.

According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep t and state at timestep t+1.

However, in the code below, state and neg_state are both at timestep t.

self.neg_loss = torch.max( zeros, self.hinge - self.energy( state, action, neg_state, no_trans=True)).mean()

I noticed that the same question was also asked here https://github.com/tkipf/c-swm/issues/3.

I want to know if this is a bug ? Does the discrepancy affect the final performance ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tkipf/c-swm/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYBYYCJMHLDCPCGIYKGDL3S5ZSCBANCNFSM4XHJKOYQ .

Yingdong-Hu commented 3 years ago

Hi, thanks for your answer. I understand that that both formulations are equivalent. But I have another question: For the Atari games, even if I set the same random seed, multiple executions of the application behave differently. According to my understanding, when I set the same seed, multiple calls to the program will produce the same result. Are there other nondeterministic operations in the code that cause the randomness ?

tkipf commented 3 years ago

Yes, I believe either the frame_skip between the individual actions (or for how many frames an action is applied) is not fully deterministic in the gym environment that our codebase uses. If you happen to have more insight into this, please feel free to share it here.

On Wed, Feb 24, 2021 at 4:34 AM Alex notifications@github.com wrote:

Hi, thanks for your answer. I now understand that that both formulations are equivalent. But I have another question: For the Atari games, even if I set the same random seed, multiple executions of the application behave differently. According to my understanding, when I set the same seed, multiple calls to the program will produce the same result. Are there other nondeterministic operations in the code that cause the randomness ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/c-swm/issues/7#issuecomment-784727128, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYBYYCQ3FYYXDUHQ75IVWLTARXWDANCNFSM4XHJKOYQ .