Closed shawnthu closed 5 years ago
Hello. pic from bengio03a paper
Did you see paper bengio(2003)?!
Do you mean about point like nn.Embedding
as C
matrix in paper?
@graykode ,it seems that you drop Wx
in your implement.And your x
is one-hot encoding of every word. But there should be a C
matrix to train a feature vector for every word in bengio's paper.
@thedogb I found what i missing thanks I mistake with W
, capital letter with 'w' lower.
def forward(self, X):
# here will be look up table like C matrix in Paper
input = X.view(-1, n_step * n_class) # [batch_size, n_step * n_class]
tanh = nn.functional.tanh(self.d + torch.mm(input, self.H)) # [batch_size, n_hidden]
output = self.b + torch.mm(input, self.W) + torch.mm(tanh, self.U) # [batch_size, n_class]
return output
Is it right? @thedogb
@thedogb Hello again.
There are two value which is show number of dimension
m
and h
(image reference is here)
so I will change my code
# NNLM Parameter
n_step = 2 # n-1 in paper
n_hidden = 2 # h in paper
m = 2 # m in paper
# Model
class NNLM(nn.Module):
def __init__(self):
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(n_step * m, n_hidden).type(dtype))
self.W = nn.Parameter(torch.randn(n_step * m, n_class).type(dtype))
self.d = nn.Parameter(torch.randn(n_hidden).type(dtype))
self.U = nn.Parameter(torch.randn(n_hidden, n_class).type(dtype))
self.b = nn.Parameter(torch.randn(n_class).type(dtype))
def forward(self, X):
X = self.C(X)
X = X.view(-1, n_step * m)
tanh = nn.functional.tanh(self.d + torch.mm(X, self.H)) # [batch_size, n_hidden]
output = self.b + torch.mm(X, self.W) + torch.mm(tanh, self.U) # [batch_size, n_class]
return output
@graykode Hello.
sorry, I don't know pytorch so much. I read code of NNLM-Tensor.py
. In this implement, there is not Wx
and C
matrix. I modify it a little to add C
matrix and keep Wx
dropped, because W
can be 0 when no direct connections from word features to outputs are desired . as follow:
m = 5
C = tf.Variable(tf.random_normal([1,n_class, m]))
C_shared = tf.tile(C, [tf.shape(X)[0],1,1]) # share parameter
vecs = tf.matmul(X,C_shared) # [batch_size, n_step, m], get feature vectors for every word
input = tf.reshape(vecs, shape=[-1, n_step * m]) # [batch_size, n_step * m]
H = tf.Variable(tf.random_normal([n_step * m, n_hidden]))
d = tf.Variable(tf.random_normal([n_hidden]))
U = tf.Variable(tf.random_normal([n_hidden, n_class]))
b = tf.Variable(tf.random_normal([n_class]))
Is it right? I'm a new learner of tensorflow, so if I make any mistakes,Please tell me. Thanks.
@thedogb m=2
in my code, I think we make C matrix, (n_class, m)
I will edit NNLM with tensorflow soon
@shawnthu see my commit! https://github.com/graykode/nlp-tutorial/commit/52c4514bb56eebf63c5ce4f5f3dc323782278f77
I can not find the embedding in nnlm!