Open dpappas opened 4 years ago
are you trying out multi-label classification?
I am using binary classification as in you article in medium. Kudos by the way.
I managed to create my own functions selecting always the second index. So wright now i am ok.
I'm getting a similar error when using the F1 score in a binary classification problem:
RuntimeError Traceback (most recent call last)
<ipython-input-9-eea47d3f12a7> in <module>
3 validate=True, # Evaluate the model after each epoch
4 schedule_type="warmup_cosine",
----> 5 optimizer_type="lamb")
/disk/no_backup/kd_challenge/fast-bert/fast_bert/learner_cls.py in fit(self, epochs, lr, validate, schedule_type, optimizer_type)
405 if validate:
406 # evaluate model
--> 407 results = self.validate()
408 for key, value in results.items():
409 tb_writer.add_scalar(
/disk/no_backup/kd_challenge/fast-bert/fast_bert/learner_cls.py in validate(self)
527 for metric in self.metrics:
528 validation_scores[metric["name"]] = metric["function"](
--> 529 all_logits, all_labels
530 )
531
/disk/no_backup/kd_challenge/fast-bert/fast_bert/metrics.py in F1(y_pred, y_true, threshold)
79
80 def F1(y_pred: Tensor, y_true: Tensor, threshold: float = CLASSIFICATION_THRESHOLD):
---> 81 return fbeta(y_pred, y_true, thresh=threshold, beta=1)
/disk/no_backup/kd_challenge/fast-bert/fast_bert/metrics.py in fbeta(y_pred, y_true, thresh, beta, eps, sigmoid)
40 y_pred = (y_pred > thresh).float()
41 y_true = y_true.float()
---> 42 TP = (y_pred*y_true).sum(dim=1)
43 prec = TP/(y_pred.sum(dim=1)+eps)
44 rec = TP/(y_true.sum(dim=1)+eps)
RuntimeError: The size of tensor a (2) must match the size of tensor b (4671) at non-singleton dimension 1
The code used looks as follows:
import sys
import torch
import logging
import datetime
from pathlib import Path
from transformers import BertTokenizer
from fast_bert.data_cls import BertDataBunch
from fast_bert.learner_cls import BertLearner
from fast_bert.metrics import accuracy, F1
DATA_PATH = Path('./bert/data/')
LABEL_PATH = Path('./bert/labels/')
MODEL_PATH=Path('./bert/models/')
LOG_PATH=Path('./bert/logs/')
OUTPUT_PATH = MODEL_PATH/'output'
MODEL_PATH.mkdir(exist_ok=True)
LOG_PATH.mkdir(exist_ok=True)
OUTPUT_PATH.mkdir(exist_ok=True)
model_state_dict = None
FINETUNED_PATH = None
FINETUNED_PATH = Path('./bert/lm_model_bert/model_out/pytorch_model.bin')
model_state_dict = torch.load(FINETUNED_PATH)
BERT_PRETRAINED_PATH = Path('./bert/wwm_uncased_L-24_H-1024_A-16/')
run_start_time = datetime.datetime.today().strftime('%Y-%m-%d_%H-%M-%S')
logfile = str(LOG_PATH/'log-{}-{}.txt'.format(run_start_time, "roberta_bs64"))
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt='%m/%d/%Y %H:%M:%S',
handlers=[
logging.FileHandler(logfile),
logging.StreamHandler(sys.stdout)])
logger = logging.getLogger()
databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
tokenizer='roberta-base',
train_file='train.csv',
val_file='val.csv',
label_file='labels.csv',
text_col='text',
label_col='label',
batch_size_per_gpu=64,
max_seq_length=64,
multi_gpu=True,
multi_label=False,
model_type='roberta')
device_cuda = torch.device("cuda")
metrics = [{'name': 'f1_score', 'function': F1}]
learner = BertLearner.from_pretrained_model(
databunch,
pretrained_path='roberta-base',
metrics=metrics,
device=device_cuda,
logger=logger,
output_dir=OUTPUT_PATH,
finetuned_wgts_path=None,
warmup_steps=500,
multi_gpu=False,
is_fp16=False,
multi_label=False,
logging_steps=50)
learner.fit(epochs=10,
lr=6e-5,
validate=True, # Evaluate the model after each epoch
schedule_type="warmup_cosine",
optimizer_type="lamb")
Any suggestions?
I followed your instructions using my data. Since the batch_size was too big for my data i changed it to 6.
Then i got this error during evaluation:
08/23/2019 17:50:14 - INFO - root - Running evaluation---------------------------------------------------------| 0.82% [49/5955 00:37<1:15:53] 08/23/2019 17:50:14 - INFO - root - Num examples = 9833 08/23/2019 17:50:14 - INFO - root - Batch size = 6 Traceback (most recent call last): File "train_fast_bert_doc_rerank.py", line 81, in <module> optimizer_type="lamb" File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 295, in fit results = self.validate() File "/usr/local/lib/python3.6/site-packages/fast_bert/learner_cls.py", line 382, in validate validation_scores[metric['name']] = metric['function'](all_logits, all_labels) File "/usr/local/lib/python3.6/site-packages/fast_bert/metrics.py", line 31, in accuracy_thresh return ((y_pred > thresh) == y_true.byte()).float().mean().item() RuntimeError: The size of tensor a (2) must match the size of tensor b (9833) at non-singleton dimension
Could you help me? Thank you in advance