memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
217 stars 34 forks source link

Training data #29

Closed sjchasel closed 3 years ago

sjchasel commented 3 years ago

Hi When I run python train.py -config config/train/config-rnn-keyphrase-one2seq-diverse.yml , there is an error:

Traceback (most recent call last): File "train.py", line 6, in <module> main() File "/home/yons/OpenNMT-kpg-release-new/onmt/bin/train.py", line 274, in main train(opt) File "/home/yons/OpenNMT-kpg-release-new/onmt/bin/train.py", line 32, in train train_single._check_save_model_path(opt) File "/home/yons/OpenNMT-kpg-release-new/onmt/train_single.py", line 54, in _check_save_model_path os.makedirs(opt.wandb_log_dir) File "/home/yons/anaconda3/lib/python3.7/os.py", line 221, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/logs/'

I found there is no wandb_log_dir in file config-rnn-keyphrase-one2seq-diverse.yml, so I deleted code about wandb in train_single.py. Can I do that?

After I deleted code about wandb in train_single.py, the error mentioned above does not appear. But there is another error:

Traceback (most recent call last): File "train.py", line 6, in <module> main() File "/home/yons/OpenNMT-kpg-release-new/onmt/bin/train.py", line 274, in main train(opt) File "/home/yons/OpenNMT-kpg-release-new/onmt/bin/train.py", line 126, in train train_iter = build_dataset_iter(shard_base, fields, opt) File "/home/yons/OpenNMT-kpg-release-new/onmt/inputters/inputter.py", line 1220, in build_dataset_iter raise ValueError('Training data %s not found' % opt.data) ValueError: Training data data/keyphrase/meng17/kp20k not found

I found according to code in inputters.py, training data will match: data/keyphrase/meng17/kp20k.train[0-9][string].pt data/keyphrase/meng17/kp20ktrain[0-9][string].pt data/keyphrase/meng17/kp20k.train[0-9][string].jsonl and so on.

But there is no pt file or jsonl file in data/keyphrase/meng17/kp20k. There is only src and tgt files in the directory.

What data should I use for training?

memray commented 3 years ago

You need to run preprocessing to generate .pt files first. Please refer to scripts like this. Directly loading from jsonl files is not supported yet.