Load from Numpy Weights

jonny-d / Tensorflow_mLSTM

Train an mLSTM language model in Tensorflow

20 stars 5 forks source link

Load from Numpy Weights #4

Closed athon-millane closed 6 years ago

athon-millane commented 6 years ago

Hi @jonnykira, I've tried a number of iterations to get loading working and this was the best I've been able to achieve - everything appears to load but trying to sample after the first 100 steps gives the same garbage expected from a newly initialised model. You'll see I added an optional --np_path argument to the parser, this is all you'll need to try it out. I don't expect you to merge this into your repo but I'd really appreciate if you could take a look and let me know what you think!

Thanks, Athon

jonny-d commented 6 years ago

Hello @athon-millane, I had a look at your code. I think you may have to use the tf.constant() initializer to initialize the weights within tf.get_variable(). For example, instead of:

W_embedding = tf.get_variable("W_embedding", initializer=params[0])

try this:

W_embedding = tf.get_variable('W_embedding',initializer=tf.constant(params[0]))

for all of your variables, I think that should work. Hope this helps!

Jonny

athon-millane commented 6 years ago

Hi again @jonnykira, thanks very much for taking a look. I did as you said, I'm not sure if that fixed things for you but I'm still encountering the same problem - the model loads the weights without error but for example loading in the openAI model weights yields scrambled predictions like so:

Sampling...

y8YÀltcr " s /Em cthea pt ofyD oleoan9Dallo t tf*ana E seeagly are Ytl sog vum tr fiat oag miueîaat .e id B otp toguos "Ia d oatetobn o on sAe ilg fil Il mtagarinha"m o Öyat Mta deos saeh ðÿm ,d oibgd S,H ir S Iooeagoke====================================================================================================

I'm not even sure how to approach debugging given that nothing is breaking, all tensors appear to be loading just samples are not performing as expected 🤔.

Athon

jonny-d commented 6 years ago

Hi @athon-millane, sorry for the late reply!

I had a look at your code again. The reason it doesn't work is that you have forgotten to add the weight normalization. You just need to include these lines after you define the variables:

            Wmx = tf.nn.l2_normalize(Wmx, dim=0)*gmx
            Wmh = tf.nn.l2_normalize(Wmh, dim=0)*gmh

            Whx = tf.nn.l2_normalize(Whx,dim=0)*ghx
            Whm = tf.nn.l2_normalize(Whm,dim=0)*ghm

            Wix = tf.nn.l2_normalize(Wix,dim=0)*gix
            Wim = tf.nn.l2_normalize(Wim,dim=0)*gim

            Wox = tf.nn.l2_normalize(Wox,dim=0)*gox
            Wom = tf.nn.l2_normalize(Wom,dim=0)*gom

            Wfx = tf.nn.l2_normalize(Wfx,dim=0)*gfx
            Wfm = tf.nn.l2_normalize(Wfm,dim=0)*gfm

I gave it a try it and it works

athon-millane commented 6 years ago

Hey @jonnykira, That fixed it for me too, thanks so much for your help!