jind11 / TextFooler

A Model for Natural Language Attack on Text Classification and Inference
MIT License
485 stars 79 forks source link

Vocab.txt for running Imdb is not available #8

Closed RishabhMaheshwary closed 4 years ago

RishabhMaheshwary commented 4 years ago

I am getting the following output:

Model name '/content/drive/My Drive/imdb' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc). We assumed '/content/drive/My Drive/imdb/vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Then the code runs and stops below:

Traceback (most recent call last):
  File "attack_classification.py", line 589, in <module>
    main()
  File "attack_classification.py", line 557, in main
    batch_size=args.batch_size)
  File "attack_classification.py", line 203, in attack
    orig_probs = predictor([text_ls]).squeeze()
  File "attack_classification.py", line 86, in text_pred
    dataloader = self.dataset.transform_text(text_data, batch_size=batch_size)
  File "attack_classification.py", line 184, in transform_text
    self.max_seq_length, self.tokenizer)
  File "attack_classification.py", line 150, in convert_examples_to_features
    tokens_a = tokenizer.tokenize(' '.join(text_a))
AttributeError: 'NoneType' object has no attribute 'tokenize'

May be it is related to cache directory path.

Can you help me to resolve this ?

jind11 commented 4 years ago

Hi, I think the main cause of this error is that the vocab.txt file is missing in the IMDB BERT model folder. I just now updated the BERT model files so that this file is in every folder: https://drive.google.com/drive/folders/1xog7EYBk1esscLgHxk23f46f73-kixC7. Could you download the model parameters files again so that the tokenizer can be initialized correctly?

RishabhMaheshwary commented 4 years ago

It is running now, Thanks

jind11 commented 4 years ago

Glad to be able to help you!