memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
216 stars 34 forks source link

Training from existing model #22

Closed mohit-madan closed 3 years ago

mohit-madan commented 3 years ago

I am trying to train my data starting from an existing top-kp20k checkpoints. I have set train_from to "kp20k-meng17-random-rnn-BS64-LR0.05-Layer1-Dim150-Emb100-Dropout0.0-Copytrue-Reusetrue-Covtrue-PEfalse-Contboth-IF1_step_90000.pt" and remaining config same as "config-rnn-keyphrase-one2seq-diverse". I am getting this error when trying to start the training:

  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.3\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.3\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/OpenNMT-kpg-release/train.py", line 212, in <module>
    main(opt)
  File "D:/OpenNMT-kpg-release/train.py", line 98, in main
    single_main(opt, 0)
  File "D:\OpenNMT-kpg-release\onmt\train_single.py", line 110, in main
    optim = Optimizer.from_opt(model, opt, checkpoint=checkpoint)
  File "D:\OpenNMT-kpg-release\onmt\utils\optimizers.py", line 277, in from_opt
    optimizer.load_state_dict(optim_state_dict)
  File "D:\OpenNMT-kpg-release\onmt\utils\optimizers.py", line 305, in load_state_dict
    self._optimizer.load_state_dict(state_dict['optimizer'])
  File "C:\Users\Mohit\anaconda3\envs\venv\lib\site-packages\torch\optim\optimizer.py", line 123, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

How can I start training my data from an existing top model?

memray commented 3 years ago

Please check out the updated code and refer to the example config file config-rnn-keyphrase-one2seq-localtest.yml. The key is to set reset_optim to all. It seems the optimizer states of the previous ckpt cannot be properly loaded.

Thanks, Rui