Open DarrenCook opened 7 years ago
Just to add: I can see eval()
is almost the sample() function. But how to adapt it to take a prefix string, and ask for the next N words?
And, if input/output is a 100-dim word embedding, then can I confirm the cross-entropy measure is the distance from the perfect 100 dimensional value? I.e. not the distance from a 1-hot encoding, as it was in the earlier char RNN examples?
This was much harder to follow than the other three in this chapter. I think it would have been helpful to first have had the gluon port of what the other three do, so we can compare speed, readability, etc. And in particular, see how to write the sample() function, which is the big thing missing in rnns-gluon.ipynb. (Or, rephrased, a big appeal of the other three notebooks in this chapter was that we could watch the learning, and could easily substitute in our own starter sentences.)
There was no mention of
temperature
here, though this was a very interesting way to control the output.It is not clear what tie_weights is going. It defaults to True, so why, and in what situations would we set it to false?
How is encoder working? Does it learn a word2vec encoding for the entire training data, and then convert the entire training data, and then that is what is divided into batches? If so, I guess a sample() function would have to take each 100-dimensional word vector output and find the closest word?
That would make sense to me, but https://github.com/apache/incubator-mxnet/blob/master/example/rnn/lstm_bucketing.py appears to be creating an embedding for each batch, in isolation, before training on it. (I could be wrong on that, as that seems a silly thing to do) BTW, that example has the same problem: failing to show how to use the model to generate text. (See also https://stackoverflow.com/q/42671658/841830 )