multilayer support, dataset improvments and more

chenb67 commented 8 years ago

Hi,

The PR is pretty big and when I rebased I encountered some conflicts.. I decided to comment out the LR decay code since adam is supposed to handle it, so your consideration when merging..

some of the new features:

multilayers lstm
enhancements to dataset handling, allow to set vocab size, shuffle before every epoch, validation set, load from csv
change to seqlstm - allows to double the network size ( i'm training on a single 4G GPU 4 layers of 1024 units in each side with 10k vocab )
dropout
l2 reg /weight decay
early stop on validation/training

I also fixed some major bugs with perplexity calculation (with the help of @vikram-gupta), and bugs with memory efficiency.

macournoyer commented 8 years ago

Awesome work as always! I'm currently running /w th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 and will update with results.

macournoyer commented 8 years ago

Results after th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000 for 50 epoch:

Epoch stats:
  Errors: min= 1.298625054104
          max= 3.8777817894112
       median= 2.3050528590272
         mean= 2.3216041151624
          std= 0.32521225826956
          ppl= 10.192010358467
     val loss= 5.8125658944054
      val ppl= 334.47625630575

The val ppl increased after each epoch (started from 121)

Eval:

you> hi
neuralconvo> I'm not sure you're not going to be a little.
you> what's your name?
neuralconvo> I'm not sure you're not going to be a little.
you> how old are you?
neuralconvo> I'm not sure you're not going to be a little.

I'm not sure if it's the eval code that is broken or the model. I've had similar issues too when I switched to SeqLSTM (in seqlstm branch).

Will try re-training w/ a single layer.

chenb67 commented 8 years ago

Hi, I think the problem is the small dataset you are using, only 50k examples. try the full set - I get to ppl 30 on val this way. The answers will tend to be generic when early stopping on the validation set, you can try to overfit the training data like before with the flag --earlyStopOnTrain

macournoyer commented 8 years ago

Even if it overfits the data, don't you find it suspect that the eval always returns the same exact output?

On master, when evaluating, I get a different output for every input even w/ small datasets. But w/ this one change I got similar behaviour (always same output). So I'm suspecting it's SeqLSTM.

I'm re-running the training w/ the full dataset and 15k vocab. I'll post results as soon as I got a couple epoch done.

chenb67 commented 8 years ago

You are right, it seems like even when the model should memorize the dataset it still gives the same response every time.. I'll investigate further and update you soon.

vikram-gupta commented 8 years ago

I am also getting the same responses when training with the following params -

th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 0 --batchSize 5

Ran one more experiment with only one layer (50 epochs) and am getting same response :(

th train.lua --cuda --hiddenSize 1000 --numLayers 1 --dataset 0 --batchSize 5

chenb67 commented 8 years ago

using this settings - (takes less than an hour to start seeing results) th train.lua --batchSize 128 --hiddenSize 512 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 I managed to overfit a model that responds differently to inputs.

It do however seems like it takes more time to establish communication between the encoder and the decoder, and the model works mostly as a language model in the first epochs.

chenb67 commented 8 years ago

Hi @macournoyer, @vikram-gupta , I added a commit that turn off seqLSTM by default(use LSTM instead) and allows to switch it back on using the flag --seqLstm. My experiments show similar results using LSTM/SeqLSTM with the same number of units. I think the lack of variety in the answers originates from the regularisation we introduced (dropout + wd) also, some of the papers acknowledge this issue with those kinds models - check http://arxiv.org/abs/1510.03055

macournoyer commented 8 years ago

@chenb67 thx for the fix and the paper! Will check it out.

I'm re-training w/ this and will see.

vikram-gupta commented 8 years ago

Thanks @chenb67

I trained the models with the following params. Note that, i used --seqLSTM flag because the code was crashing during evaluation as we are converting the input to table.

th train.lua --batchSize 64 --hiddenSize 1000 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 --seqLstm

The results have improved but we still have something more to do before they are as good as @macournoyer reported initially. Its surprising that even after nullifying almost all of the changes, the results are still not same as before. @macournoyer any clues?

you> how are you? neuralconvo> Oh, you met him... you> where are you? neuralconvo> In your place? you> what is your name? neuralconvo> You're talking about the precogs... you> how old are you? neuralconvo> You're talking about the precogs... you> where do you live? neuralconvo> I'm just an eye out. you> are you intelligent? neuralconvo> Yes, sir. you> are you a bot? neuralconvo> But don't you remember? you> are you hungry? neuralconvo> Oh, you met him... you> hello neuralconvo> You're talking about the precogs...

After 50 epochs, these were the stats - Errors: min= 0.17394069818649
max= 0.61486148644254 median= 0.37594955411701 mean= 0.37832337311441 std= 0.07127508704293 ppl= 1.4598349379268 val loss= 7.2912249430419 val ppl= 1467.3670378147

The error on training kept on going down with each epoch.

macournoyer commented 8 years ago

Something definitely happened in this branch or recently on master that decreased the (subjective) quality of the responses in eval.th.

It might be in the recent changes I pushed on master, I'm looking into it...

macournoyer / neuralconvo

multilayer support, dataset improvments and more #38