language model generator question

evanthebouncy commented 6 years ago

In this file:

https://github.com/pytorch/examples/blob/master/word_language_model/generate.py

What does this input mean in the generation?

input = torch.randint(ntokens, (1, 1), dtype=torch.long).to(device)

As I understand it in a rnn-based language model, the last output of the rnn is fed into the current input and the sequence is unrolled. What is the meaning of this random input? Does it enforce the last output is being fed into the current input in the unrolling?

Thanks!

(I am building a sequence generator that needs to consume its output from the last input, and I am wondering how to do it. Are you suggesting just feeding in random input would also work? Any hints would be helpful ! )

nzmora commented 6 years ago

This input tensor is used to sample the dictionary - to randomly choose the first word in the input sequence. The next time input is used, it is already after it was set to the output of the RNN:

output, hidden = model(input, hidden)
word_weights = output.squeeze().div(args.temperature).exp().cpu()
word_idx = torch.multinomial(word_weights, 1)[0]
input.fill_(word_idx)

I hope that helps, Neta

evanthebouncy commented 6 years ago

I see. But this assumes uniform distribution on the first word, which isn't what a language model is right? Shouldn't the first input always be and the first word should be sampled from the distribution of the corpus?

For instance, it is very unlikely any sentence would start with the word "unfortunate"

I'm perfectly okay with the answer "yeah but who cares it's easier this way", which is what I would've done too, technically bit incorrect but who cares. Is that the case?

pytorch / examples

language model generator question #357