zackchase / mxnet-the-straight-dope

An interactive book on deep learning. Much easy, so MXNet. Wow. [Straight Dope is growing up] ---> Much of this content has been incorporated into the new Dive into Deep Learning Book available at https://d2l.ai/.
https://d2l.ai/
Apache License 2.0
2.56k stars 725 forks source link

ch5, rnns-gluon.ipynb; various feedback #183

Open DarrenCook opened 7 years ago

DarrenCook commented 7 years ago

This was much harder to follow than the other three in this chapter. I think it would have been helpful to first have had the gluon port of what the other three do, so we can compare speed, readability, etc. And in particular, see how to write the sample() function, which is the big thing missing in rnns-gluon.ipynb. (Or, rephrased, a big appeal of the other three notebooks in this chapter was that we could watch the learning, and could easily substitute in our own starter sentences.)

There was no mention of temperature here, though this was a very interesting way to control the output.

It is not clear what tie_weights is going. It defaults to True, so why, and in what situations would we set it to false?

How is encoder working? Does it learn a word2vec encoding for the entire training data, and then convert the entire training data, and then that is what is divided into batches? If so, I guess a sample() function would have to take each 100-dimensional word vector output and find the closest word?

That would make sense to me, but https://github.com/apache/incubator-mxnet/blob/master/example/rnn/lstm_bucketing.py appears to be creating an embedding for each batch, in isolation, before training on it. (I could be wrong on that, as that seems a silly thing to do) BTW, that example has the same problem: failing to show how to use the model to generate text. (See also https://stackoverflow.com/q/42671658/841830 )

DarrenCook commented 7 years ago

Just to add: I can see eval() is almost the sample() function. But how to adapt it to take a prefix string, and ask for the next N words?

And, if input/output is a 100-dim word embedding, then can I confirm the cross-entropy measure is the distance from the perfect 100 dimensional value? I.e. not the distance from a 1-hot encoding, as it was in the earlier char RNN examples?