nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.28k stars 464 forks source link

No such file or directory: '../bert_data.train.pt' #92

Open kurodenjiro opened 4 years ago

kurodenjiro commented 4 years ago

File "train.py", line 122, in train_abs(args, device_id) File "E:\project\PreSumm\src\train_abstractive.py", line 273, in train_abs train_abs_single(args, device_id) File "E:\project\PreSumm\src\train_abstractive.py", line 334, in train_abs_single trainer.train(train_iter_fct, args.train_steps) File "E:\project\PreSumm\src\models\trainer.py", line 133, in train train_iter = train_iter_fct() File "E:\project\PreSumm\src\train_abstractive.py", line 313, in train_iter_fct shuffle=True, is_test=False) File "E:\project\PreSumm\src\models\data_loader.py", line 136, in init self.cur_iter = self._next_dataset_iterator(datasets) File "E:\project\PreSumm\src\models\data_loader.py", line 156, in _next_dataset_iterator self.cur_dataset = next(dataset_iter) File "E:\project\PreSumm\src\models\data_loader.py", line 94, in load_dataset yield _lazy_dataset_loader(pt, corpus_type) File "E:\project\PreSumm\src\models\data_loader.py", line 78, in _lazy_dataset_loader dataset = torch.load(pt_file) File "C:\Users\Admin\Anaconda3\lib\site-packages\torch\serialization.py", line 419, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: '../bert_data.train.pt'

i dont know why , in folder bert_data only have cnndm.test.0.bert.pt , not train.pt How to fix it ?

cuthbertjohnkarawa commented 4 years ago

can you provide the command you use , also I think you should check the directory were you store your processed data

astariul commented 4 years ago

Did you use -bert_data_path ../bert_data ?

-bert_data_path option should also contain the prefix of your data.


So instead of using -bert_data_path ../bert_data, try :

-bert_data_path ../bert_data/cnndm

robinsongh381 commented 4 years ago

Can you try this ?

add two lines to format_to_bert function in data_builder.py (line 276)

      if not os.path.isdir(args.save_path):
          os.mkdir(args.save_path)
JackXueIndiana commented 4 years ago

My command is python3 train.py -task ext -mode train -bert_data_path ../bert_data/bert_data_cnndm_final/ -ext_dropout 0.1 -model_path ../models -lr 2e-3 -visible_gpus -1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -train_steps 50000 -accum_count 2 -log_file ../logs/ext_bert_cnndm -use_interval true -warmup_steps 10000 -max_pos 512

I have just downloaded the dataset file named bert_data_cnndm_final.zip and unzipped into ./bert_data/bert_data_cnndm_final. I still see the error:

No such file or directory: '../bert_data/bert_data_cnndm_final/.train.pt'

Any idea? Thanks.

JackXueIndiana commented 4 years ago

For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).

LuJunru commented 4 years ago

For the testing purpose, I keep only one training file in -bert_data_path and name it as .train.pt the command runs without any problem (but only have about 2k examples in it).

You should rewrite the line 84 in data_loader.py to something like: args.bert_data_path + 'cnndm.' + corpus_type + '.[0-9]*.bert.pt'. This works for me on xsum data: args.bert_data_path + 'xsum.' + corpus_type + '.[0-9]*.bert.pt'

JackXueIndiana commented 4 years ago

LuJunru, you are right. I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9].bert.pt')) Thanks for your hint.

JackXueIndiana commented 4 years ago

Somehow the copy-and paste removed "" in my code. This change was I changed line 84 in data_loader.py to pts = sorted(glob.glob(args.bert_data_path + '/[a-z].' + corpus_type + '.[0-9]*.bert.pt'))

onrmrt commented 4 years ago

@JackXueIndiana, I made a minor fix in your code: pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt')) It works properly this way.

mmcmahon13 commented 4 years ago

Even after applying the fixes above I've yet to get it to run...my data is in the ~/PreSumm/bert_data/bert_data_cnndm_final directory, and yet I still get No such file or directory: '/home/mmcmahon/PreSumm/bert_data/bert_data_cnndm_final/cnndm.test.pt'

Ghani-25 commented 4 years ago

Same as me all the fixex above didnt work for me

File "src/train.py", line 122, in train_abs(args, device_id) File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 273, in train_abs train_abs_single(args, device_id) File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 334, in train_abs_single trainer.train(train_iter_fct, args.train_steps) File "C:\Users\Ghani\Desktop\PreSumm\src\models\trainer.py", line 133, in train train_iter = train_iter_fct() File "C:\Users\Ghani\Desktop\PreSumm\src\train_abstractive.py", line 313, in train_iter_fct shuffle=True, is_test=False) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 136, in init self.cur_iter = self._next_dataset_iterator(datasets) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 156, in _next_dataset_iterator self.cur_dataset = next(dataset_iter) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 94, in load_dataset yield _lazy_dataset_loader(pt, corpus_type) File "C:\Users\Ghani\Desktop\PreSumm\src\models\data_loader.py", line 78, in _lazy_dataset_loader dataset = torch.load(pt_file) File "C:\Users\Ghani\Anaconda3\lib\site-packages\torch\serialization.py", line 381, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'bert_data/news.train.pt'

Xinxin-Lai commented 4 years ago

Same for me, I tried to run the repo for GPT-2 detecting model, but it threw me such an error:

Loading checkpoint from detector-base.pt Traceback (most recent call last): File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\xinxin\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\xinxin\Documents\gpt-2-output-dataset\detector\server.py", line 120, in

fire.Fire(main) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 468, in _Fire target=component.__name__) File "C:\Users\xinxin\Anaconda3\lib\site-packages\fire\core.py", line 672, in _CallAnd UpdateTrace component = fn(*varargs, **kwargs) File "C:\Users\xinxin\Documents\gpt-2-output-dataset\detector\server.py", line 83, in main data = torch.load(checkpoint, map_location='cpu') File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 525, i n load with _open_file_like(f, 'rb') as opened_file: File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 212, i n _open_file_like return _open_file(name_or_buffer, mode) File "C:\Users\xinxin\Anaconda3\lib\site-packages\torch\serialization.py", line 193, i n __init__ super(_open_file, self).__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'detector-base.pt'
germanenik commented 3 years ago

^ same issue as @mmcmahon13 and @Ghani-25

kush-2418 commented 2 years ago

@JackXueIndiana, I made a minor fix in your code: pts = sorted(glob.glob(args.bert_data_path + '/[a-z]*.' + corpus_type + '.[0-9]*.bert.pt')) It works properly this way.

Thanks it worked