raff7 / HRED-Chatbot

Hierarchical recurrent encoder decoder for a conversational agent
7 stars 0 forks source link

encoder,decoder and context all compiled #1

Open lytum opened 4 years ago

lytum commented 4 years ago

Hi, thanks for your code and sharing. May i ask you one question?

Why did you compile all for encoder, decoder and context? showed in the following code. encoder_model = hred.model_compile(encoder_model) decoder_model = hred.model_compile(decoder_model) context = hred.model_compile(context)

Because as my understanding, we just only compile the entire model, like this: final_model = hred.build_final_model(encoder_model, decoder_model,context) final_model = hred.model_compile(final_model)

So could you help me to explain this?

Thanks in advance!

raff7 commented 4 years ago

Hi, I'm afraid i can't answer you with very much detail as i simply don't quite remember my reasoning there, it might have something to do with the fact that the model could not be simply trained end-to-end because of the context module.. but most likely i might have made a mistake

lytum commented 4 years ago

thanks anyway. and thanks for your feedback.------------------ Original ------------------From: Raffaele Piccini notifications@github.comDate: Tue,Jun 23,2020 0:18 PMTo: raff7/HRED-Chatbot HRED-Chatbot@noreply.github.comCc: lytum tummasterly@gmail.com, Author author@noreply.github.comSubject: Re: [raff7/HRED-Chatbot] encoder,decoder and context all compiled(#1) Hi, I'm afraid i can't answer you with very much detail as i simply don't quite remember my reasoning there, it might have something to do with the fact that the model could not be simply trained end-to-end because of the context module.. but most likely i might have made a mistake

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/raff7/HRED-Chatbot/issues/1#issuecomment-648049605", "url": "https://github.com/raff7/HRED-Chatbot/issues/1#issuecomment-648049605", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

lytum commented 4 years ago

Hi, I'm afraid i can't answer you with very much detail as i simply don't quite remember my reasoning there, it might have something to do with the fact that the model could not be simply trained end-to-end because of the context module.. but most likely i might have made a mistake

Hi, may i ask you another question? Could you help me to explain why return_sequences=False, return_state=False in the follwoing decoder code? decoder_lstm = LSTM(self.latent_dim, return_sequences=False, return_state=False) decoder_outputs= decoder_lstm(LSTM_input, initial_state=encoder_states)

raff7 commented 4 years ago

Because with return_sequence it would return the whole sequence of the rnn, while there i am just returning the last output, return_state returns the internal state of the lstm, which also is not important, you just need the output to get a probability distribution over the dictionary to generate a word

Also, if i can give you a piece of advice, this technique is quite antiquated by now, transformers have been shown to outperform LSTMs in this type of tasks, especially because by now there are many pre-trained models that you can use