orenmel / context2vec

Apache License 2.0
216 stars 60 forks source link

How to restore a model? #4

Open mfxss opened 7 years ago

mfxss commented 7 years ago

Is it ok if I add S.load_npz(model_file, model) after model = BiLstmContext(args.deep, args.gpu, reader.word2index, context_word_units, lstm_hidden_units, target_word_units, loss_func, True, args.dropout) in train_context2vec.py without using common.model_reader? Thank you very much.

orenmel commented 7 years ago

Note that the model_reader also loads the word2index mapping, which is essential for applying the model.

mfxss commented 7 years ago

This is where I modified.

#cs = [reader.trimmed_word2count[w] for w in range(len(reader.trimmed_word2count))]
#loss_func = L.NegativeSampling(target_word_units, cs, NEGATIVE_SAMPLING_NUM, args.ns_power)
if args.context == 'lstm':
    model = model_reader.model
    model_reader = ModelReader(model_param_file)

It seems that the train=False and assert train == False in model_reader.py should also be modified. And the trained word embedding is included in model.loss_func.W.data. Am I right? If I missed something that should be modified, Please tell me. Thank you very very much

orenmel commented 7 years ago

This seems ok. The only thing is that the model_reader doesn't bother to initialize the loss_func with the correct values, because it's currently not supported in train mode. If your purpose it to further train a model that you load, then you should make sure you initialize the model's loss_func correctly with the true cs values.

mfxss commented 7 years ago

I am a little confused. Do cs values change after each epoch? What does cs stand for? Here is my new code.

cs = [reader.trimmed_word2count[w] for w in range(len(reader.trimmed_word2count))]
loss_func = L.NegativeSampling(target_word_units, cs, NEGATIVE_SAMPLING_NUM, args.ns_power)
if args.context == 'lstm':
    #model_reader = ModelReader(model_param_file)
    #model = model_reader.model
    model = BiLstmContext(args.deep, args.gpu, reader.word2index, context_word_units, lstm_hidden_units, target_word_units, loss_func, True, args.dropout)
    S.load_npz(model_file, model)

I can use word2index of reader. Is the word2index of reader the same as that of model_reader? Moreover, how can I restore the word embedding matrix w into the model? Will loss_func.W.data=model_reader.w work?

orenmel commented 7 years ago

Just to make sure, could you please describe what your end-goal here is? Are you trying to load one of our existing models and continue training it for more epochs? Using which corpus?

mfxss commented 7 years ago

My goal is to train a ukwac model like yours, with different parameters. I have run for one epoch, for some reason, I had to stop. Now I want to load the model to continue. I found while training, the word embedding was also trained. So how to load this word embedding in targets file? This is all I wonder. Oh,I read the model file, it seems that model file saves the loss_func.W.data. So there is no need to load the word embedding targets file again. Right? Thank you.

orenmel commented 7 years ago

Ok. So as long as you are using the exact same corpus that you used in the first epoch, then your code should work fine (since reader.word2index would be identical to the one used in the first epoch). And yes, there's no need to load the word embedding targets.

mfxss commented 7 years ago

With more epoches, the loss begin to increase, did this occur to you? And the accuracy of WSD became lower.

orenmel commented 7 years ago

As you could see from the code, I never continued training of an existing model. In the case of UkWac, I trained for one epoch, then later I trained for 3 epochs from scratch, and the performance of the latter model was better. I wouldn't expect the train loss to increase in your case, but maybe there's something I'm missing. One thing that does come to mind is that to do this properly you should also save (and later restore) the Adam optimizer state along with the model.