xiadingZ / video-caption.pytorch

pytorch implementation of video captioning
MIT License
400 stars 128 forks source link

Most likely an error in S2VTModel #12

Closed ParitoshParmar closed 6 years ago

ParitoshParmar commented 6 years ago

Hi Ding,

Thanks a ton for this project! This might be an issue, I am just not sure. With

self.rnn1.flatten_parameters()
self.rnn2.flatten_parameters()

in 'train' mode in S2VTModel, I am getting the following error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation. To suppress this error, I am commenting out snippet for flattening the parameters. The model seems to be converging, with loss values: model_0, loss: 22.717190; model_50, loss: 15.616700; model_100, loss: 12.238667; model_150, loss: 11.222753.

I just wanted your opinion whether I am making any mistake by commenting out self.rnn1.flatten_parameters(); self.rnn2.flatten_parameters() ? (I am using one GPU)

xiadingZ commented 6 years ago

I just comment them out, thanks. I often use s2vtattmodel, hasn't tested s2vtmodel

ParitoshParmar commented 6 years ago

Thank you. Also, while we are at it, I just wanted to share one more thing and it would be great if you can throw some light. I noticed something while I was trying to solve the previous error. I noticed that when I use the following code in misc/utils.py:

        batch_size = logits.shape[0]
        target1 = target[:, :logits.shape[1]]
        mask1 = mask[:, :logits.shape[1]]
        logits2 = logits.contiguous().view(-1,logits.shape[2])
        target2 = target1.contiguous().view(-1)
        mask2 = mask1.contiguous().view(-1)
        loss = self.loss_fn(logits2, target2)
        output = torch.sum(loss * mask2) / batch_size
        return output

instead of the original code, which is as follows:

        batch_size = logits.shape[0]
        target = target[:, :logits.shape[1]]
        mask = mask[:, :logits.shape[1]]
        logits = logits.contiguous().view(-1, logits.shape[2])
        target = target.contiguous().view(-1)
        mask = mask.contiguous().view(-1)
        loss = self.loss_fn(logits, target)
        output = torch.sum(loss * mask) / batch_size
        return output

, then I am getting different losses. With the modified code, loss values are: _model_0, loss: 22.717190; model50, loss: 15.616700 With the original code, loss values are: _model_0, loss: 64.293571; model50, loss: 33.291115

In the modified code, I am just using different variable names, instead of reusing them. So, logically, we should be getting the same loss values (surely, with some minor variations, due to different random values), right? What are your thoughts?

xiadingZ commented 6 years ago

stange, can't see difference. maybe you can use torch/np.random.seed to test it.

or you can fix logits, mask to test this block of code

ParitoshParmar commented 6 years ago

I apologize, it was my mistake. I had reduce=True in self.loss_fn = nn.NLLLoss(reduce=False). That's why loss was just a value, instead of a vector. I changed that, and now I am getting same results. Thanks a lot!

ivy94419 commented 6 years ago

@ParitoshParmar Excuse me, I am new to this. I also have encounter this problem, and you said you just commenting out two lines:

self.rnn1.flatten_parameters()
self.rnn2.flatten_parameters()

I have checked the S2VTModel.py code, I find the author have comment out them in forward() in line 39 and 40, but errors also happened. I comment out line 51 and 52, and it can train without errors.

I want to know if I have comment the right lines?

Furthermore, I have trained 450 Epochs, but the performance is not good as you have given:

image

How to get results closer to yours?

Thank you very much!

ParitoshParmar commented 6 years ago

Hi @ivy94419 , I think you have commented the right lines. As I had clarified, I was getting a lower loss because I had reduce=True in self.loss_fn = nn.NLLLoss(reduce=False). That's why loss was just a value, instead of a vector. It was a mistake on my part. Your loss values seem to be correct to me.

ivy94419 commented 6 years ago

@ParitoshParmar Oh, I got it, I just don't understand the differences of loss value and loss vector.

Moreover, I have split the train video to train: val : test = 8000: 1000: 1000 in sequence.and run eval.py with model-250, the following scores got:

'Bleu_1': 0.7791051902354527, 'Bleu_2': 0.6264020683380294, 'Bleu_3': 0.48633616297163446, 'Bleu_4': 0.36414797419483796, 'METEOR': 0.2708591015728904, 'ROUGE_L': 0.5861664892844087, 'CIDEr': 0.42026128172472876

Compare with README, this score is too high... Have you test your model, is this normal?

ParitoshParmar commented 6 years ago

@ivy94419 , if reduce = True, then the losses are averaged, so you'd be getting just one value, if it is False, then you'd get the whole vector of individual values.

I haven't tested my model, yet. Yes, the scores are pretty high. I am not sure, but may be the author Ding would have tested on all 2000 samples, while you have tested only on 1000 samples, so that might be the reason for the difference in the results.