Closed athon-millane closed 6 years ago
Hello @athon-millane, I had a look at your code. I think you may have to use the tf.constant() initializer to initialize the weights within tf.get_variable(). For example, instead of:
W_embedding = tf.get_variable("W_embedding", initializer=params[0])
try this:
W_embedding = tf.get_variable('W_embedding',initializer=tf.constant(params[0]))
for all of your variables, I think that should work. Hope this helps!
Jonny
Hi again @jonnykira, thanks very much for taking a look. I did as you said, I'm not sure if that fixed things for you but I'm still encountering the same problem - the model loads the weights without error but for example loading in the openAI model weights yields scrambled predictions like so:
Sampling...
y8YÀltcr " s /Em cthea pt ofyD oleoan9Dallo t tf*ana E seeagly are Ytl sog vum tr fiat oag miueîaat .e id B otp toguos "Ia d oatetobn o on sAe ilg fil Il mtagarinha"m o Öyat Mta deos saeh ðÿm ,d oibgd S,H ir S Iooeagoke====================================================================================================
I'm not even sure how to approach debugging given that nothing is breaking, all tensors appear to be loading just samples are not performing as expected 🤔.
Athon
Hi @athon-millane, sorry for the late reply!
I had a look at your code again. The reason it doesn't work is that you have forgotten to add the weight normalization. You just need to include these lines after you define the variables:
Wmx = tf.nn.l2_normalize(Wmx, dim=0)*gmx
Wmh = tf.nn.l2_normalize(Wmh, dim=0)*gmh
Whx = tf.nn.l2_normalize(Whx,dim=0)*ghx
Whm = tf.nn.l2_normalize(Whm,dim=0)*ghm
Wix = tf.nn.l2_normalize(Wix,dim=0)*gix
Wim = tf.nn.l2_normalize(Wim,dim=0)*gim
Wox = tf.nn.l2_normalize(Wox,dim=0)*gox
Wom = tf.nn.l2_normalize(Wom,dim=0)*gom
Wfx = tf.nn.l2_normalize(Wfx,dim=0)*gfx
Wfm = tf.nn.l2_normalize(Wfm,dim=0)*gfm
I gave it a try it and it works
Hey @jonnykira, That fixed it for me too, thanks so much for your help!
Hi @jonnykira, I've tried a number of iterations to get loading working and this was the best I've been able to achieve - everything appears to load but trying to sample after the first 100 steps gives the same garbage expected from a newly initialised model. You'll see I added an optional --np_path argument to the parser, this is all you'll need to try it out. I don't expect you to merge this into your repo but I'd really appreciate if you could take a look and let me know what you think!
Thanks, Athon