macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

adam optimizer and more.. #35

Closed chenb67 closed 8 years ago

chenb67 commented 8 years ago

Hi,

In this PR included:

I'm working next on better dataset class, multilayer model and testing seqLSTM instead of just LSTM.

Chen

macournoyer commented 8 years ago

Awesome work! Will give this a try soon.

Did you test it out w/ a few sample conversations? Did it improve or degrade results (objectively)?

chenb67 commented 8 years ago

I couldn't get the results in the readme in any of the models I tested so far. ( 512 hidden units) However, the version with adam seems to converge faster than the SGD version, and we can use any optimizer now in a plug-n-play fashion. Conversation-wise, I get quite similar results.

macournoyer commented 8 years ago

Damnit! I let it run over the weekend but forgot to pull your latest changes ...

Good news is it appears to be running a bit faster. 1h30m w/ --dataset 50000 --hiddenSize 1000. 15min less than before on my machine.

Will update w/ results as soon as I have some.

macournoyer commented 8 years ago

Got some results after 50 epoch w/: $ th train.lua --cuda --dataset 30000 --hiddenSize 1000

  Errors: min= 0.12092084043205
          max= 12.942585066745
       median= 1.2809126438544
         mean= 1.3072453725232
          std= 0.34914281377921
          ppl= 3.6959786325385
th> say "Hi"
>> Alone tonight?
                                                                      [0.2976s]
th> say "What is your name?"
>> Victor frankenstein, sir.
                                                                      [0.2085s]
th> say "How old are you?"
>> Twenty six years.
                                                                      [0.1879s]
th> say "What color is the sky?"
>> It's a problem.
                                                                      [0.2599s]
th> say "Are you human?"
>> I'm fine.
                                                                      [0.1666s]
th> say "Are you a machine?"
>> I've had worse.
                                                                      [0.2127s]
th> say "Where are you from?"
>> Helsinki,, which is the capital of finland.
                                                                      [0.3404s]
th> say "How are you doing?"
>> Very well...
                                                                      [0.1381s]
th> say "Do you like to cook?"
>> Sure.
                                                                      [0.0989s]
th> say "What is the purpose of life?"
>> It's a game.
                                                                      [0.2717s]
th> say "What is the purpose of living?"
>> Nothing not.
                                                                      [0.1161s]
th> say "Hello"
>> You ready?
                                                                      [0.1244s]
th> say "Hello?"
>> Hello, is this elaine?
                                                                      [0.2425s]
th> say "No"
>> You were screaming you weren't nick!
                                                                      [0.3195s]
th> say "No"
>> You were screaming you weren't nick!
                                                                      [0.3653s]
th> say "I'm not Nick"
>> And don't blame me...
                                                                      [0.2993s]
th> say "I won't"
>> He's not quite...
                                                                      [0.2065s]
th> say "what?"
>> I'm trying to stop the
                                                                      [0.2919s]
th> say "the what?"
>> The little brenner girl.

The training took more memory so I had to drop from 50k to 30k examples. I think this impacted the quality of some answers above (eg.: "What color is the sky?"). But other answers were pretty good!

However, I had to remove the "conversion to CPU before saving" because it took too much memory. Could make this an option and I'll merge.

i55code commented 8 years ago

Correct PPL calculation is so important:) Thank you!

i55code commented 8 years ago

Btw, that is really the coolest name I have ever heard, the machine is claiming he is a little monster:) Haha!

i55code commented 8 years ago

But somehow I am not impressed with this answer: th> say "What is the purpose of life?"

It's a game.

th> say "What is the purpose of living?"

Nothing not.

I hope he can do better...

i55code commented 8 years ago

Can you help to push this changes to main branch? Thank you!

chenb67 commented 8 years ago

@macournoyer Yes there is an issue in this version we keep a reference to the params. it caused memory increase from after the first epoch. I fix it in my dev branch some time ago.. do you want to just fix it or to add the option anyway?

Chen

macournoyer commented 8 years ago

@chenb67 ah! If you have a fix, that'd be way better than an option :)

macournoyer commented 8 years ago

Thanks again for the amazing work @chenb67 !

chenb67 commented 8 years ago

Thanks for the cool project! @macournoyer I have more features in my dev branch.. would you rather small PRs for every feature or again a pretty big PR?

macournoyer commented 8 years ago

@chenb67 a big PR like this one is fine with me. Whatever is simpler.