utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.87k stars 341 forks source link

ERROR: 'NoneType' object is not iterable error during loading of training data #194

Open Shane-Neeley opened 4 years ago

Shane-Neeley commented 4 years ago

Any idea what would cause this? It seems like it gets through most of the examples before failing.

INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/adamlin/NCBI_BERT_pubmed_mimic_uncased_base_transformers/tokenizer_config.json from cache at /home/ubuntu/.cache/torch/transformers/6389e7150ee74c4594a9117c0b9f0f23db49b25f47d55b7c07c8f32025238a45.1ade4e0ac224a06d83f2cb9821a6656b6b59974d6552e8c728f2657e4ba445d9
INFO:root:Writing example 0 of 19423
INFO:root:Writing example 10000 of 19423
Traceback (most recent call last):
  File "test_fast-bert.py", line 59, in <module>
    no_cache=True
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fast_bert/data_cls.py", line 494, in __init__
    val_examples, "dev", no_cache=self.no_cache
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fast_bert/data_cls.py", line 592, in get_dataset_from_examples
    logger=self.logger,
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fast_bert/data_cls.py", line 138, in convert_examples_to_features
    for (ex_index, example) in enumerate(examples):
TypeError: 'NoneType' object is not iterable

This is what is running:

databunch = BertDataBunch(
    DATA_PATH,
    LABEL_PATH,
    tokenizer='adamlin/NCBI_BERT_pubmed_mimic_uncased_base_transformers',
    train_file='train.csv',
    val_file='val.csv',
    label_file='labels.csv',
    text_col='text',
    label_col=labels,
    batch_size_per_gpu=16,
    max_seq_length=512,
    multi_gpu=False,
    multi_label=True,
    model_type='bert',
    no_cache=True
)
aaronbriel commented 4 years ago

Is there something unique about example 10000? I wonder if there are any null examples or ones with non-parsable characters?