yandex / faster-rnnlm

Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax
Other
561 stars 138 forks source link

Training with several hidden layers #2

Open VeliBaba opened 9 years ago

VeliBaba commented 9 years ago

Hi! I have some questions about faster-rnnlm. There it is possible to use several hidden layers during training. My questions are:

  1. Which of them is used for recurrent part?
  2. Does it use those hidden layers during decoding or computing entropy? Thanks!
akhti commented 9 years ago

Hi!

  1. All of them. The output of one layer is the input for the next one. For instance, if you have two tanh layers then the network looks like this: h1_t = tanh(xt + W1 * h1{t - 1}) h2_t = tanh(U * h1t + W2 * h2{t - 1})
  2. Yes, it does.
VeliBaba commented 9 years ago

Ok. Is the ouput of the last hidden layer used at the input of the next neural network?

akhti commented 9 years ago

What is 'next neural network'? If you mean next timestamp (next word), then the answer is yes.

VeliBaba commented 9 years ago

Yes, I mean this. Ok, thanks

VeliBaba commented 9 years ago

Is it good at performance using several hidden layers instead of a single hidden layer? Which is better: use a single hidden layer with size 400, or 4 hidden layers with size 100?

akhti commented 9 years ago

First, when you increase layer in 4 times, training/evaluation time (in theory) is increased in 16 times (4 squared). So it's more resonable to compare 1 layer of size 400 with 4 layers of size 200. However, I would recomment to train a shallow network with a single first.

VeliBaba commented 9 years ago

Hi! I have two different toolkits for training of the rnnlm: the first one is rnnlm-hs-0.1b (Ilya-multithreading), and the second one is faster-rnnlm.The faster-rnnlm is faster than rnnlm-hs-0.1b about 3 times with the same options. Is it expectable that valid entropy at the end of training may be worse with faster-rnnlm than rnnlm-hs-0.1b?

akhti commented 9 years ago

It's expected that the entropy will be more or less the same.