Closed rhaps0dy closed 8 years ago
Hi @rhaps0dy,
Both approaches are equally correct, and different implementations of the same model.
embedding_lookup
, which is essentially indexing the embedding matrix with integer indices. This is algorithmically the same as multiplying by a one-hot vector, just more efficient.The values of embeddings
change, since we are learning the embeddings. The values of the weight matrix used in your attached code will also change. Neither code is using pre-trained embeddings, but you could do that, with some small changes.
Hello @sherjilozair,
Ah, this makes a lot of sense. Also, you have to learn n_charsrnn_size for the embeddings, instead of n_charsrnn_size + n_chars for weights and biases for the one-hot, so there are less parameters to learn. (Did I understand that correctly?)
Many thanks!
In principle, yes, although the biases can always be removed, or the embedding matrix multiply layer can be merged with the RNN layer, as has been done in the gist you attached.
Hello,
If I understand correctly, the tridimensional
inputs
tensor is built by looking up the n-th row ofembeddings
for each number in the bidimensionalself.input_data
tensor. The rows ofembeddings
have the same size as the RNN's internal layers. This seems to be the way to input the different characters to the network.The Tensorflow variable
"embeddings"
has nothing assigned to it explicitly, therefore it is drawn from a uniform distribution each timetrain.py
is ran. Why is that? I would have expected embeddings to be a matrix of one-hot row vectors, encoding the different characters; and having that mapped to the internal layer by weights as in https://gist.github.com/karpathy/d4dee566867f8291f086 .Also, printing
embeddings
at the end of every run, I notice that its value changes every time.I would be very grateful if someone would explain to me what is going on here.
Yours truly, rhaps0dy