microsoft / DialoGPT

Large-scale pretraining for dialogue
MIT License
2.34k stars 341 forks source link

'BucketingDataLoader' object has no attribute 'db' #16

Open bwang482 opened 4 years ago

bwang482 commented 4 years ago

I am running on Ubuntu 18.04 with cuda 10. I have followed Setup & Installation (TL;DR) - Train model with Conda Environment.

python3.6 demo.py Found existing ./models folder, skip creating a new one! 11/20/2019 19:07:20 - INFO - main - Downloading models... 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/config.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/vocab.json exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/merges.txt exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/pytorch_model.bin exists, return! 11/20/2019 19:07:20 - INFO - demo_utils - ./models/small/small_ft.pkl exists, return! 11/20/2019 19:07:20 - INFO - main - Done!

11/20/2019 19:07:20 - INFO - main - Downloading and Extracting Data... 11/20/2019 19:07:20 - INFO - main - Preparing Data... prepro.py --corpus ./data/train.tsv --max_seq_len 128 11/20/2019 19:07:22 - INFO - main - Done!

11/20/2019 19:07:22 - INFO - main - Generating training CMD! 11/20/2019 19:07:22 - INFO - main - If there is any problem, please copy (modify) and run command below 11/20/2019 19:07:22 - INFO - main - ######################################################################### python LSP_train.py --model_name_or_path ./models/small --init_checkpoint ./models/small/pytorch_model.bin --train_input_file ./data/train.128len.db --eval_input_file ./data/dummy_data.tsv --output_dir ./models/output_model --seed 42 --max_seq_length 128 --train_batch_size 512 --gradient_accumulation_steps 8 --eval_batch_size 64 --learning_rate 1e-5 --num_optim_steps 10000 --valid_step 5000 --warmup_steps 4000 --normalize_data true --fp16 true --lr_schedule noam --loss_scale 0.0 --no_token_id true --pbar true 11/20/2019 19:07:22 - INFO - main - ######################################################################### 11/20/2019 19:07:23 - INFO - main - train batch size = 512, new train batch size (after gradient accumulation) = 64 11/20/2019 19:07:23 - INFO - main - CUDA available? True 11/20/2019 19:07:23 - INFO - main - Input Argument Information 11/20/2019 19:07:23 - INFO - main - model_name_or_path ./models/small 11/20/2019 19:07:23 - INFO - main - seed 42 11/20/2019 19:07:23 - INFO - main - max_seq_length 128 11/20/2019 19:07:23 - INFO - main - skip_eval False 11/20/2019 19:07:23 - INFO - main - init_checkpoint ./models/small/pytorch_model.bin 11/20/2019 19:07:23 - INFO - main - train_input_file ./data/train.128len.db 11/20/2019 19:07:23 - INFO - main - eval_input_file ./data/dummy_data.tsv 11/20/2019 19:07:23 - INFO - main - continue_from 0 11/20/2019 19:07:23 - INFO - main - train_batch_size 64 11/20/2019 19:07:23 - INFO - main - gradient_accumulation_steps 8 11/20/2019 19:07:23 - INFO - main - eval_batch_size 64 11/20/2019 19:07:23 - INFO - main - learning_rate 1e-05 11/20/2019 19:07:23 - INFO - main - num_optim_steps 10000 11/20/2019 19:07:23 - INFO - main - valid_step 5000 11/20/2019 19:07:23 - INFO - main - warmup_proportion 0.1 11/20/2019 19:07:23 - INFO - main - warmup_steps 4000 11/20/2019 19:07:23 - INFO - main - normalize_data True 11/20/2019 19:07:23 - INFO - main - fp16 True 11/20/2019 19:07:23 - INFO - main - lr_schedule noam 11/20/2019 19:07:23 - INFO - main - loss_scale 0.0 11/20/2019 19:07:23 - INFO - main - no_token_id True 11/20/2019 19:07:23 - INFO - main - output_dir ./models/output_model 11/20/2019 19:07:23 - INFO - main - log_dir None 11/20/2019 19:07:23 - INFO - main - pbar True 11/20/2019 19:07:23 - INFO - main - local_rank -1 11/20/2019 19:07:23 - INFO - main - config None 11/20/2019 19:07:23 - INFO - main - device cuda 11/20/2019 19:07:23 - INFO - main - n_gpu 8 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading vocabulary file ./models/small/vocab.json 11/20/2019 19:07:23 - INFO - pytorch_pretrained_bert.tokenization_gpt2 - loading merges file ./models/small/merges.txt Traceback (most recent call last): File "LSP_train.py", line 176, in args.max_seq_length) File "/mnt/sdb/Tools/DialoGPT/data_loader.py", line 114, in init self.db = shelve.open(f'{db_name}/db', 'r') File "/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py", line 243, in open return DbfilenameShelf(filename, flag, protocol, writeback) File "/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/shelve.py", line 227, in init Shelf.init(self, dbm.open(filename, flag), protocol, writeback) File "/mnt/sdb/miniconda3/envs/LSP/lib/python3.6/dbm/init.py", line 91, in open "available".format(result)) dbm.error: db type is dbm.gnu, but the module is not available Exception ignored in: <bound method BucketingDataLoader.del of <data_loader.BucketingDataLoader object at 0x7f082fdc4cc0>> Traceback (most recent call last): File "/mnt/sdb/Tools/DialoGPT/data_loader.py", line 151, in del self.db.close() AttributeError: 'BucketingDataLoader' object has no attribute 'db' 11/20/2019 19:07:23 - INFO - main - Done!

bwang482 commented 4 years ago

ls -lh total 148K drwxrwxr-x 5 bowang bowang 4.0K Nov 20 18:10 configs drwxrwxr-x 3 bowang bowang 4.0K Nov 20 18:21 data -rw-rw-r-- 1 bowang bowang 506 Nov 20 18:10 data_config.py -rw-rw-r-- 1 bowang bowang 12K Nov 20 18:10 data_loader.py -rw-rw-r-- 1 bowang bowang 4.6K Nov 20 18:10 demo.py -rw-rw-r-- 1 bowang bowang 3.8K Nov 20 18:10 demo_utils.py drwxrwxr-x 3 bowang bowang 4.0K Nov 20 18:10 dstc -rw-rw-r-- 1 bowang bowang 203 Nov 20 18:10 env.py drwxrwxr-x 3 bowang bowang 4.0K Nov 20 18:21 gpt2_training -rw-rw-r-- 1 bowang bowang 1.2K Nov 20 18:10 LICENSE -rw-rw-r-- 1 bowang bowang 1.5K Nov 20 18:10 LSP-generic.yml -rw-rw-r-- 1 bowang bowang 2.0K Nov 20 18:10 LSP-linux.yml drwxrwxr-x 3 bowang bowang 4.0K Nov 20 18:21 lsp_model -rw-rw-r-- 1 bowang bowang 15K Nov 20 18:10 LSP_train.py -rw-rw-r-- 1 bowang bowang 16 Nov 20 18:10 MANIFEST.in drwxrwxr-x 4 bowang bowang 4.0K Nov 20 18:21 models -rw-rw-r-- 1 bowang bowang 3.7K Nov 20 19:07 output.log -rw-rw-r-- 1 bowang bowang 7.1K Nov 20 18:10 prepro.py drwxrwxr-x 2 bowang bowang 4.0K Nov 20 18:21 pycache drwxrwxr-x 8 bowang bowang 4.0K Nov 20 18:21 pycocoevalcap -rw-rw-r-- 1 bowang bowang 29K Nov 20 18:10 README.md drwxrwxr-x 6 bowang bowang 4.0K Nov 20 18:10 reddit_extractor -rw-rw-r-- 1 bowang bowang 2.4K Nov 20 18:10 SECURITY.md

intersun commented 4 years ago

It looks like prepro.py failed during running, can you share with us under which folder do you run the command, and print what are under folder "data"? Thanks!

Monica9502 commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

intersun commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

Monica9502 commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

intersun commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

were you running the cmd under the project home folder, i.e., 'your_download_path/DialoGPT/'? It seems there is a relative path issue if you run demo.py under other folders cause it simply is a bunch of bash cmds. We will fix it in next few days.

intersun commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works?

btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

Monica9502 commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works?

btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

I have tried new version, and got the following bug, error occurred, b'gzip: ./train.tsv.gz: No such file or directory\n' It seems that the data file is missing, how can I get it ? Thanks for help.

intersun commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works? btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

I have tried new version, and got the following bug, error occurred, b'gzip: ./train.tsv.gz: No such file or directory\n' It seems that the data file is missing, how can I get it ? Thanks for help.

Can you try to run the script by setting 'data' as 'dummy' and see if it works? Or run it simply by "python demo.py" and do not put in any other arguments.

I tried 'small' data and indeed the data file is missing. We will try to fix it.

Monica9502 commented 4 years ago

I'm facing the same problem, but there are required files under the folder "data". Any ideas?

Can you print what is listed under "data" folder? Thanks!

-rw-rw-r--. 1 387872 Nov 25 11:05 dummy_data.tsv -rw-rw-r--. 1 71 Nov 25 11:05 prepare4db.sh drwxrwxr-x. 2 4096 Nov 25 14:14 train.128len.db -rw-rw-r--. 1 387872 Nov 25 11:05 train_raw.tsv -rw-rw-r--. 1 467873 Dec 9 10:58 train.tsv

I just edited demo.py so it will not have relative path problem, or at least print some informative messages for debugging purpose. can you pull and see if it works? btw I check it on my side it worked on any running directory. But let me know if you still have any questions or problems.

I have tried new version, and got the following bug, error occurred, b'gzip: ./train.tsv.gz: No such file or directory\n' It seems that the data file is missing, how can I get it ? Thanks for help.

Can you try to run the script by setting 'data' as 'dummy' and see if it works? Or run it simply by "python demo.py" and do not put in any other arguments.

I tried 'small' data and indeed the data file is missing. We will try to fix it.

Thanks! The other two commands work.

rahulnarang commented 3 years ago

is this fixed? We are still facing issue in running python demo.py --data smal

Also on running make command directly in reddit_extractor, it returns the below output - SIZE=small make -j 8 output - make: Nothing to be done for 'all'.

NontawatWuttikam commented 2 years ago

I also encountered the same problem. However in my case, it was because of the corruption of the .db (in my case was train.128len.db).

I was realized later that I had interrupted the process while it was running demo.py and downloading the resources. When I tried to run it again, It didn't check for the integrity of the .db file and caused the error.

I solved the problem by removing the .db in the data folder and the downloaded model_path folder in the models folder (ex. small) and everything works again.