A mismatch between the implementation and the paper

lijierui commented 4 years ago

In the implementation of the decoder, here in linguistic-style-transfer-pytorch/linguistic_style_transfer_pytorch/model.py, in this function "def generate_sentences(self, input_sentences, latent_emb, inference=False):" starting from Line 428, it seems that you are concatenating the content & style embedding to each word embedding of the sentence. However, based on my understanding of the paper, the concatenation is between the content representation & the style representation, and this forms the hidden states of the decoder(last paragraph of 3.1 in the paper). In contrast, hidden states of the decoder in this implementation are simply zeros as in Line 442 and 463(both training and generating mode), which is mentioned by #5.

       input_sentences = torch.cat(
            (sos_token_tensor, input_sentences), dim=1)
        sentence_embs = self.dropout(self.embedding(input_sentences))
        # Make the latent embedding compatible for concatenation
        # by repeating it for max_seq_len + 1(additional one bcoz <sos> tokens were added)
        latent_emb = latent_emb.unsqueeze(1).repeat(
            1, mconfig.max_seq_len+1, 1)
        gen_sent_embs = torch.cat(
            (sentence_embs, latent_emb), dim=2)

Another problem is the shape error when generating sentences under inference mode, the latent_emb isn't concatenated to word_emb as that in the training mode. However, I begin to wonder whether that's right to concatenate word embeddings & latent embeddings in the first place.

Since I didn't look into the code fully & carefully, I might be getting this wrong.

sharan21 commented 3 years ago

Hey everyone, I also faced many many trivial device/shape mismatch mistakes and have corrected almost all of them. I am trying to run generate.py but again there is a shape mismatch in generate_sentences() in model.py. The issue you have raised regarding the wrong implementation of concatenation while running the decoder is exactly the problem I saw. There must be one more linear layer to convert the final Z vector to the initial hidden state of the decoder. I have no idea why the authors have done this. Can you please report of any of you finally managed to get good results from this code? I just hope all my efforts arent in vain.

naveen-kinnal commented 3 years ago

@sharan21 I kind of fixed that problem. Would you like to know more? Connect with me on iamkinnal@gmail.com

sharan21 commented 3 years ago

@anthammas thanks, I emailed you :)

h3lio5 / linguistic-style-transfer-pytorch

A mismatch between the implementation and the paper #9