oswaldoludwig / Seq2seq-Chatbot-for-Keras

This repository contains a new generative model of chatbot based on seq2seq modeling.
Apache License 2.0
331 stars 98 forks source link

EOS position in bot training #5

Closed jld23 closed 7 years ago

jld23 commented 7 years ago

@oswaldoludwig I've been adapting your code for my conversation file and I can't figure out the meaning of this line I got a variety of errors but if I set l = np.where(sent==0) the code runs. I don't know if that works or not. The same code is also at line 171.

Can explain what the EOS is doing?

Thanks!

oswaldoludwig commented 7 years ago

This is because you generated a new vocabulary. BOS and EOS mean begin and end of sentence. You have to check in your new dictionary the indices of these tokens and replace them in the code (they can change according to the frequency in your dataset, the current indices are 2 and 3, respectively). Should be nice to have a function to check automatically these indices, for users that want a new vocabulary. If you do it, please create a pull request.

jld23 commented 7 years ago

Thank you. I only see a reference to EOS. Does the BOS play a role here?

On Jul 27, 2017, at 6:52 PM, Oswaldo Ludwig notifications@github.com wrote:

This is because you generated a new vocabulary. BOS and EOS mean begin and end of sentence. You have to check in your new dictionary the indices of these tokens and replace them in the code (they can change according to the frequency in your dataset, the current indices are 2 and 3, respectively). Should be nice to have a function to check automatically these indices, for users that want a new vocabulary. If you do it, please create a pull request.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oswaldoludwig/Seq2seq-Chatbot-for-Keras/issues/5#issuecomment-318508722, or mute the thread https://github.com/notifications/unsubscribe-auth/ADA5y3UmfA1EMZQf7TRIqijNFUKy70_tks5sSRSvgaJpZM4Olpu1.

oswaldoludwig commented 7 years ago

Yes, please see line 49 of train_bot.py, for instance. The index 2 is BOS. Check also conversation.py.

jld23 commented 7 years ago

Thank you for your help. I'll do a PR once I get things cleaned up.

iuria21 commented 5 years ago

Hi, is there any other restriction? The padding is being done with the value 0, but the 0 value of the vocabulary may change also, doesn't it?