adam optimizer and more..

chenb67 commented 8 years ago

Hi,

In this PR included:

adam optimizer instead of vanilla sgd (through optim package- it allows up to use any other optimizer too)
when saving we convert to CPU before saving to allow loading model to GPUless machine. (I handled loading too so we can still load CUDA/CL version from disk)
FastLSTM instead of LSTM (faster + allows batch normalization)
gradient clipping
loss calculation considering variable sequence length in batch and masking -> correct perplexity calculation.
examples are shuffled before being saved to disk (I'm working on shuffling before every epoch)

I'm working next on better dataset class, multilayer model and testing seqLSTM instead of just LSTM.

Chen

macournoyer commented 8 years ago

Awesome work! Will give this a try soon.

Did you test it out w/ a few sample conversations? Did it improve or degrade results (objectively)?

chenb67 commented 8 years ago

I couldn't get the results in the readme in any of the models I tested so far. ( 512 hidden units) However, the version with adam seems to converge faster than the SGD version, and we can use any optimizer now in a plug-n-play fashion. Conversation-wise, I get quite similar results.

macournoyer commented 8 years ago

Damnit! I let it run over the weekend but forgot to pull your latest changes ...

Good news is it appears to be running a bit faster. 1h30m w/ --dataset 50000 --hiddenSize 1000. 15min less than before on my machine.

Will update w/ results as soon as I have some.

macournoyer commented 8 years ago

Got some results after 50 epoch w/: $ th train.lua --cuda --dataset 30000 --hiddenSize 1000

  Errors: min= 0.12092084043205
          max= 12.942585066745
       median= 1.2809126438544
         mean= 1.3072453725232
          std= 0.34914281377921
          ppl= 3.6959786325385

th> say "Hi"
>> Alone tonight?
                                                                      [0.2976s]
th> say "What is your name?"
>> Victor frankenstein, sir.
                                                                      [0.2085s]
th> say "How old are you?"
>> Twenty six years.
                                                                      [0.1879s]
th> say "What color is the sky?"
>> It's a problem.
                                                                      [0.2599s]
th> say "Are you human?"
>> I'm fine.
                                                                      [0.1666s]
th> say "Are you a machine?"
>> I've had worse.
                                                                      [0.2127s]
th> say "Where are you from?"
>> Helsinki,, which is the capital of finland.
                                                                      [0.3404s]
th> say "How are you doing?"
>> Very well...
                                                                      [0.1381s]
th> say "Do you like to cook?"
>> Sure.
                                                                      [0.0989s]
th> say "What is the purpose of life?"
>> It's a game.
                                                                      [0.2717s]
th> say "What is the purpose of living?"
>> Nothing not.
                                                                      [0.1161s]
th> say "Hello"
>> You ready?
                                                                      [0.1244s]
th> say "Hello?"
>> Hello, is this elaine?
                                                                      [0.2425s]
th> say "No"
>> You were screaming you weren't nick!
                                                                      [0.3195s]
th> say "No"
>> You were screaming you weren't nick!
                                                                      [0.3653s]
th> say "I'm not Nick"
>> And don't blame me...
                                                                      [0.2993s]
th> say "I won't"
>> He's not quite...
                                                                      [0.2065s]
th> say "what?"
>> I'm trying to stop the
                                                                      [0.2919s]
th> say "the what?"
>> The little brenner girl.

The training took more memory so I had to drop from 50k to 30k examples. I think this impacted the quality of some answers above (eg.: "What color is the sky?"). But other answers were pretty good!

However, I had to remove the "conversion to CPU before saving" because it took too much memory. Could make this an option and I'll merge.

i55code commented 8 years ago

Correct PPL calculation is so important:)　Thank you!

i55code commented 8 years ago

Btw, that is really the coolest name I have ever heard, the machine is claiming he is a little monster:) Haha!

i55code commented 8 years ago

But somehow I am not impressed with this answer: th> say "What is the purpose of life?"

It's a game.

th> say "What is the purpose of living?"

Nothing not.

I hope he can do better...

i55code commented 8 years ago

Can you help to push this changes to main branch? Thank you!

chenb67 commented 8 years ago

@macournoyer Yes there is an issue in this version we keep a reference to the params. it caused memory increase from after the first epoch. I fix it in my dev branch some time ago.. do you want to just fix it or to add the option anyway?

Chen

macournoyer commented 8 years ago

@chenb67 ah! If you have a fix, that'd be way better than an option :)

macournoyer commented 8 years ago

Thanks again for the amazing work @chenb67 !

chenb67 commented 8 years ago

Thanks for the cool project! @macournoyer I have more features in my dev branch.. would you rather small PRs for every feature or again a pretty big PR?

macournoyer commented 8 years ago

@chenb67 a big PR like this one is fine with me. Whatever is simpler.

macournoyer / neuralconvo

adam optimizer and more.. #35