Greetings! Needed help with the missing notebook

nilavghosh commented 4 years ago

Hi James, I was going through your article on medium https://medium.com/@jbetker/implementing-seq2seq-with-attention-in-keras-63565c8e498c

but could not find the notebook on the mentioned link. Would you be kind enough to share it back on github or email it to nilavghosh@gmail.com

Regards, Nilav Ghosh

neonbjb commented 4 years ago

Hi Nilav, Sorry, I took it down because there some compatibility issues with TF2 as well as an implementation error that someone found. I can send you the archived link if you want to try and figure it out, but it'd probably be best if you looked elsewhere for examples at this point.

James

On Sat, Mar 7, 2020 at 6:06 PM Nilav notifications@github.com wrote:

Hi James, I was going through your article on medium

https://medium.com/@jbetker/implementing-seq2seq-with-attention-in-keras-63565c8e498c http://url

but could not find the notebook on the mentioned link. Would you be kind enough to share it back on github or email it to nilavghosh@gmail.com

Regards, Nilav Ghosh

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neonbjb/BootstrapNLP/issues/1?email_source=notifications&email_token=AAGLMOV5SFFVGO4ZYNQ2DNTRGLVSJA5CNFSM4LDUWHHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ITK2F4A, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGLMOV3QNNPI7OYUHWGVKLRGLVSJANCNFSM4LDUWHHA .

--

James Betker

nilavghosh commented 4 years ago

@neonbjb . Thanks for the prompt reply. It would be great if you could share the archived link. I will try and fix it and share collaborator rights with you :). I'm already looking at other links but most of them have eager execution compatible code. I'm trying to look at a non-eager version.

Thanks, once again. Nilav

neonbjb commented 4 years ago

Done, I invited you to the repo.

Aside from TF2 errors, note that there's a big implementation error in that I put Keras layers like "Dense" inside of LSTMWithAttention.py. This is not actually allowed in TF and it causes those layers to not be trainable. You can fix it by either: 1) Implementing Dense by hand inside of LSTMWithAttention 2) Pulling the trainable variables from the Layer members of LSTMWithAttention and adding it to the class trainable_variables 3) Subclassing model instead of Layer (not sure if this would be possible).

The interesting thing is that, as is, this model trains quite well in TF1. What I surmise is happening is that the rest of the model is fitting around the randomly initialized attention parameters. What's particularly surprising is that the attention mechanism actually works (as pictured). I suspect it would work a lot better if done right.

nilavghosh commented 4 years ago

Thanks @neonbjb for sharing the repo and the comments on issues you understand with the current implementation. Will look into it.

The surprising bit is, the NMT tensorflow article you reference https://www.tensorflow.org/tutorials/text/nmt_with_attention also does not initialize the decoder states with the hidden states of the encoder and the results they show are pretty good. Makes me think - would the decoder learn faster if initialized properly?

neonbjb commented 4 years ago

Huh, you're right! If I have some free time I'm going to have to play with that example and see if it has the same example as mine. You can tell pretty easily by compiling the model and seeing if the attention weights are included inside of model.trainable_weights. They definitely need to fix this if so, because its a pretty insidious issue that's very hard to detect. The only reason I know is because I was doing the same thing to a much greater extent in a different model and spent several days trying to figure out why it wasn't training.

On Sat, Mar 7, 2020 at 9:11 PM Nilav notifications@github.com wrote:

Thanks @neonbjb https://github.com/neonbjb for sharing the repo and the comments on issues you understand with the current implementation. Will look into it.

The surprising bit is, the NMT tensorflow article you reference https://www.tensorflow.org/tutorials/text/nmt_with_attention also does not initialize the decoder states with the hidden states of the encoder and the results they show are pretty good. Makes me think - would the decoder learn faster if initialized properly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neonbjb/BootstrapNLP/issues/1?email_source=notifications&email_token=AAGLMOW5DQIWI5ODO6BHPGDRGMLH3A5CNFSM4LDUWHHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEL74A#issuecomment-596164592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGLMORVCGAP7ZQFX6AP3BDRGMLH3ANCNFSM4LDUWHHA .

--

James Betker

neonbjb / BootstrapNLP

Greetings! Needed help with the missing notebook #1

--

--