zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

Where is xlnet loading weights from? #130

Open Enumaris opened 5 years ago

Enumaris commented 5 years ago

I'm using the "Custom usage of XLNet" section. I noticed that I was never asked to provide a directory where the actual weights of the pre-trained model are loaded from. I was asked only for a path to model_config_path. I looked in xlnet.py and I also couldn't see where weights are being loaded at all. I'm really confused. When I use the "Custom usage of XLNet" code, am I training from scratch even if I specified is_finetune=True? Or does XLNet use the directory I gave for model_config_path to figure out a path to the actual model weights somehow?

twoflypig commented 5 years ago

Hi, Did you see the function "init_from_checkpoint" in "model_utils.py" ? This function is used in run_squad.py and I think you may need this if you want to use custom XLNET. I hope this helps or maybe I'm wrong.

Enumaris commented 5 years ago

Yes, I found that function in model_utils.py. I just assumed that the "is_finetune" flag would prompt the model to initialize from a checkpoint, but I guess I was wrong. I have incorporated that function into my code now, thanks!

twoflypig commented 5 years ago

Hi, I checked the code and found that if you specified is_finetune=True, this only determines the input of the model, rather than loading model weights. So I think you need function "init_from_checkpoint". What's more, I know a way to check this. Step1: You can sum the embedding weights to see whether the result will change for several runs, if you specific is_finetune=True. (Because the model is randomly initialized if you don't specified the random seed) Step2: You should use init_from_checkpoint first, and then go step1 to see result whether will change;/ If I'm right, you will see a fixed result in step 2. I hope this helps.

OmriPi commented 5 years ago

@twoflypig I did as you recommended but I see a different sum in every run even after loading the checkpoint, the only way to make the sum not change is if I freeze the random seed. Why is there even randomization when loading from checkpoint? I don't get that

This is my code:

    with tf.Session() as sess:
        xlnet_model = xlnet.xlnet.XLNetModel(xlnet_config=xlnet_config, run_config=run_config,
                                             input_ids=np.expand_dims(sentence_features.input_ids, 1).astype('int32'),
                                             seg_ids=np.expand_dims(sentence_features.segment_ids, 1).astype('int32'),
                                             input_mask=np.expand_dims(sentence_features.input_mask, 1).astype('float32'))
        init_from_checkpoint(FLAGS, True)
        summary = xlnet_model.get_pooled_out(summary_type="last")
        sess.run(tf.global_variables_initializer())
        nps = summary.eval()
        print(np.sum(summary.eval()[0]))

I know I have to use tf.global_variables_initializer() but I don't know in fact where to place it in the code, as well as where to place the init_from_checkpoint(). Can you (or anyone) help me?

alexpnt commented 5 years ago

https://github.com/zihangdai/xlnet/issues/183#issuecomment-515419756