Open kurodenjiro opened 5 years ago
can you provide the command you use , also I think you should check the directory were you store your processed data
Did you use -bert_data_path ../bert_data
?
-bert_data_path
option should also contain the prefix of your data.
So instead of using -bert_data_path ../bert_data
, try :
-bert_data_path ../bert_data/cnndm
Can you try this ?
add two lines to format_to_bert
function in data_builder.py
(line 276)
if not os.path.isdir(args.save_path):
os.mkdir(args.save_path)
My command is python3 train.py -task ext -mode train -bert_data_path ../bert_data/bert_data_cnndm_final/ -ext_dropout 0.1 -model_path ../models -lr 2e-3 -visible_gpus -1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -train_steps 50000 -accum_count 2 -log_file ../logs/ext_bert_cnndm -use_interval true -warmup_steps 10000 -max_pos 512
I have just downloaded the dataset file named bert_data_cnndm_final.zip and unzipped into ./bert_data/bert_data_cnndm_final. I still see the error:
No such file or directory: '../bert_data/bert_data_cnndm_final/.train.pt'
Any idea? Thanks.
For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).
For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).
You should rewrite the line 84 in data_loader.py to something like: args.bert_data_path + 'cnndm.' + corpus_type + '.[0-9]*.bert.pt'. This works for me on xsum data: args.bert_data_path + 'xsum.' + corpus_type + '.[0-9]*.bert.pt'
LuJunru, you are right. I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9].bert.pt')) Thanks for your hint.
Somehow the copy-and paste removed "" in my code. This change was I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9]*.bert.pt'))
@JackXueIndiana, I made a minor fix in your code:
pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt'))
It works properly this way.
Even after applying the fixes above I've yet to get it to run...my data is in the ~/PreSumm/bert_data/bert_data_cnndm_final directory, and yet I still get No such file or directory: '/home/mmcmahon/PreSumm/bert_data/bert_data_cnndm_final/cnndm.test.pt'
Same as me all the fixex above didnt work for me
File "src/train.py", line 122, in
Same for me, I tried to run the repo for GPT-2 detecting model, but it threw me such an error:
Loading checkpoint from detector-base.pt Traceback (most recent call last): File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\xinxin\Documents\gpt-2-output-dataset\detector\server.py", line 120, in
^ same issue as @mmcmahon13 and @Ghani-25
@JackXueIndiana, I made a minor fix in your code:
pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt'))
It works properly this way.
Thanks it worked
File "train.py", line 122, in
train_abs(args, device_id)
File "E:\project\PreSumm\src\train_abstractive.py", line 273, in train_abs
train_abs_single(args, device_id)
File "E:\project\PreSumm\src\train_abstractive.py", line 334, in train_abs_single
trainer.train(train_iter_fct, args.train_steps)
File "E:\project\PreSumm\src\models\trainer.py", line 133, in train
train_iter = train_iter_fct()
File "E:\project\PreSumm\src\train_abstractive.py", line 313, in train_iter_fct
shuffle=True, is_test=False)
File "E:\project\PreSumm\src\models\data_loader.py", line 136, in init
self.cur_iter = self._next_dataset_iterator(datasets)
File "E:\project\PreSumm\src\models\data_loader.py", line 156, in _next_dataset_iterator
self.cur_dataset = next(dataset_iter)
File "E:\project\PreSumm\src\models\data_loader.py", line 94, in load_dataset
yield _lazy_dataset_loader(pt, corpus_type)
File "E:\project\PreSumm\src\models\data_loader.py", line 78, in _lazy_dataset_loader
dataset = torch.load(pt_file)
File "C:\Users\Admin\Anaconda3\lib\site-packages\torch\serialization.py", line 419, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '../bert_data.train.pt'
i dont know why , in folder bert_data only have cnndm.test.0.bert.pt , not train.pt How to fix it ?