Closed ndolev closed 4 years ago
I am attempting to create a BertDataBunch for a multilabel classification exactly like in the readme. I provide a list of labels but it seems like data_cls.py is expecting the labels to be floats instead of strings. Any ideas?
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer='bert-base-uncased', train_file='bert_train_set.csv', val_file='bert_val_set.csv', label_file='bert_labels.csv', text_col='text', label_col=['label1',label2','label3'], batch_size_per_gpu=16, max_seq_length=512, multi_gpu=True, multi_label=True, model_type='bert')
~/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in convert_examples_to_features(examples, label_list, max_seq_length, tokenizer, output_mode, cls_token_at_end, pad_on_left, cls_token, sep_token, pad_token, sequence_a_segment_id, sequence_b_segment_id, cls_token_segment_id, pad_token_segment_id, mask_padding_with_zero, logger) 174 label_id = [] 175 for label in example.label: --> 176 label_id.append(float(label)) 177 else: 178 if example.label is not None:
And my bert_labels.csv looks like:
label1 label2 label3
And bert_train_set like:
index,text,label1,label2,label3
The error message made it hard to diagnose but the problem was on my side - a string snuck into my one hot encoded multi-label data set. :)
I am attempting to create a BertDataBunch for a multilabel classification exactly like in the readme. I provide a list of labels but it seems like data_cls.py is expecting the labels to be floats instead of strings. Any ideas?
And my bert_labels.csv looks like:
And bert_train_set like: