pranav-ust / BERT-keyphrase-extraction

Keyphrase Extraction based on Scientific Text, Semeval 2017, Task 10
108 stars 49 forks source link

'NoneType' object has no attribute 'convert_tokens_to_ids' #10

Open ShivanshuPurohit opened 4 years ago

ShivanshuPurohit commented 4 years ago

While running train.py I encountered this error: Model name 'model/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'model/vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Traceback (most recent call last): File "train.py", line 168, in <module> train_data = data_loader.load_data('train') File "/content/BERT-keyphrase-extraction/data_loader.py", line 83, in load_data self.load_sentences_tags(sentences_file, tags_path, data) File "/content/BERT-keyphrase-extraction/data_loader.py", line 51, in load_sentences_tags sentences.append(self.tokenizer.convert_tokens_to_ids(tokens)) AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

I think it isn't registering the pytorch_model.bin file, which I directly downloaded as bert-base-uncased.tar.gz

Also, when I modify the command to go in task1/train, python train.py --data_dir data/task1/train/ --bert_model_dir model/ --model_dir experiments/base_model the error is: Loading the datasets... Traceback (most recent call last): File "train.py", line 165, in <module> data_loader = DataLoader(args.data_dir, args.bert_model_dir, params, token_pad_idx=0) File "/content/BERT-keyphrase-extraction/data_loader.py", line 28, in __init__ self.tag_pad_idx = self.tag2idx['O'] KeyError: 'O'

sahiljethani commented 3 years ago

In BertTokenizer's, convert_tokens_to_ids function gives KeyError. So, I suggest to modify the for loop in the function as follows.

for token in tokens: ids.append(self.vocab.get(token, self.vocab['[UNK]']))

arunmack789 commented 2 years ago

tokens = self.tokenizer.tokenize(line) used this instead of split()

hnrNeha commented 2 years ago

While running train.py I encountered this error: Model name 'model/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'model/vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Traceback (most recent call last): File "train.py", line 168, in <module> train_data = data_loader.load_data('train') File "/content/BERT-keyphrase-extraction/data_loader.py", line 83, in load_data self.load_sentences_tags(sentences_file, tags_path, data) File "/content/BERT-keyphrase-extraction/data_loader.py", line 51, in load_sentences_tags sentences.append(self.tokenizer.convert_tokens_to_ids(tokens)) AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

I think it isn't registering the pytorch_model.bin file, which I directly downloaded as bert-base-uncased.tar.gz

Also, when I modify the command to go in task1/train, python train.py --data_dir data/task1/train/ --bert_model_dir model/ --model_dir experiments/base_model the error is: Loading the datasets... Traceback (most recent call last): File "train.py", line 165, in <module> data_loader = DataLoader(args.data_dir, args.bert_model_dir, params, token_pad_idx=0) File "/content/BERT-keyphrase-extraction/data_loader.py", line 28, in __init__ self.tag_pad_idx = self.tag2idx['O'] KeyError: 'O'

hey....how did you complete this step From scibert repo, untar the weights (rename their weight dump file to pytorch_model.bin) and vocab file into a new folder model. can you please help with this