salesforce / awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch
BSD 3-Clause "New" or "Revised" License
1.96k stars 488 forks source link

Finetune issue #55

Open giancds opened 6 years ago

giancds commented 6 years ago

Hello guys, first thanks for sharing your code with us.

I have noticed a problem when running the fine-tune process as I'm getting an error

RuntimeError: invalid argument 2: size '[-1 x 10000]' is invalid for input with 227500 elements at /pytorch/aten/src/TH/THStorage.c:37

and it happens when

output_flat = output.view(-1, ntokens)

is called in the evaluate function of finetune.py.

After some investigation, I have found that the call for

decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))

have been dropped from model.py

I understand that the SplitCrossEntropyLoss is doing this step for us on training but, given the fine tune is done with regular cross entropy, shouldn't we include this line back in the code?

My apologies if I'm missing something!

adityamogadala commented 6 years ago

I also faced same issue with fine-tuning. Any solution to this problem?

giancds commented 6 years ago

Well, I changed the code to include a new parameter called decode set as False by default. If it is passed as True in the finetuning script the missing line is called.

adityamogadala commented 6 years ago

Thank you. Works for me :+1: