Regression task with blurr

DominikVogel commented 2 years ago

Hi there, I am a beginner and try to teach myself to use blurr for a regression task. My goal is to assign a score (1-5) to a sequence instead of assigning it to a category (e.g., positive or negative). As a starting point, I used the doc’s code on the GLUE benchmarks (https://ohmeow.github.io/blurr/examples-high-level-api.html) and tried to adapt it. I made the following changes to the code:

I replaced the GLUE data with the (English part of the) amazon_reviews_multi dataset (raw_datasets = load_dataset("amazon_reviews_multi", "en")
Changed the metric and loss function (learn_kwargs = { 'metrics': [rmse], 'loss_func': MSELossFlat() }
Specified the number of labels (n_labels=1)

Unfortunately, the code does not work. When I try the build the learner, I get the following error:

Could not do one pass in your dataloader, there is something wrong in it

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-14-45bb1d37df2c> in <module>()
6                                                             dblock_splitter=IndexSplitter(valid_idxs),
7                                                             dl_kwargs=dl_kwargs, learner_kwargs=learn_kwargs,
----> 8                                                             n_labels=1)
9 learn = learn.to_fp16()

/usr/local/lib/python3.7/dist-packages/blurr/modeling/core.py in from_dictionaries(cls, ds, pretrained_model_name_or_path, preprocess_func, text_attr, label_attr, n_labels, dblock_splitter, dl_kwargs, learner_kwargs)
418         return cls._create_learner(ds, pretrained_model_name_or_path, preprocess_func,
419                                    text_attr, label_attr, n_labels, dblock_splitter,
--> 420                                    dl_kwargs, learner_kwargs)

/usr/local/lib/python3.7/dist-packages/blurr/modeling/core.py in _create_learner(cls, data, pretrained_model_name_or_path, preprocess_func, text_attr, label_attr, n_labels, dblock_splitter, dl_kwargs, learner_kwargs)
323
324         # return BLearner instance
--> 325         return cls(dls, hf_model, **learner_kwargs.copy())
326
327     @classmethod

/usr/local/lib/python3.7/dist-packages/blurr/modeling/core.py in __init__(self, dls, hf_model, **kwargs)
245         **kwargs
246     ):
--> 247         super().__init__(dls, hf_model, **kwargs)
248
249     @classmethod

/usr/local/lib/python3.7/dist-packages/blurr/modeling/core.py in __init__(self, dls, hf_model, base_model_cb, **kwargs)
226         **kwargs
227     ):
--> 228         model = kwargs.get('model', HF_BaseModelWrapper(hf_model))
229         loss_func = kwargs.pop('loss_func', dls.loss_func if hasattr(dls, 'loss_func') else None)
230         splitter = kwargs.pop('splitter', hf_splitter)

/usr/local/lib/python3.7/dist-packages/fastcore/meta.py in __call__(cls, *args, **kwargs)
37         if type(res)==cls:
38             if hasattr(res,'__pre_init__'): res.__pre_init__(*args,**kwargs)
---> 39             res.__init__(*args,**kwargs)
40             if hasattr(res,'__post_init__'): res.__post_init__(*args,**kwargs)
41         return res

/usr/local/lib/python3.7/dist-packages/blurr/modeling/core.py in __init__(self, hf_model, output_hidden_states, output_attentions, hf_model_kwargs)
58
59         store_attr(self=self, names='output_hidden_states, output_attentions, hf_model_kwargs')
---> 60         self.hf_model = hf_model.cuda() if torch.cuda.is_available() else hf_model
61
62         self.hf_model_fwd_args = list(inspect.signature(self.hf_model.forward).parameters.keys())

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in cuda(self, device)
678             Module: self
679         """
--> 680         return self._apply(lambda t: t.cuda(device))
681
682     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
568     def _apply(self, fn):
569         for module in self.children():
--> 570             module._apply(fn)
571
572         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
568     def _apply(self, fn):
569         for module in self.children():
--> 570             module._apply(fn)
571
572         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
568     def _apply(self, fn):
569         for module in self.children():
--> 570             module._apply(fn)
571
572         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
591             # `with torch.no_grad():`
592             with torch.no_grad():
--> 593                 param_applied = fn(param)
594             should_use_set_data = compute_should_use_set_data(param, param_applied)
595             if should_use_set_data:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in <lambda>(t)
678             Module: self
679         """
--> 680         return self._apply(lambda t: t.cuda(device))
681
682     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I would be very thankful if somebody could help me set up blurr for a regression task!

Here is the Colab notebook: https://colab.research.google.com/drive/1qCwv-nE7JxYXsWOR9gsXZfM8k4Gx32tt?usp=sharing

This is the full code:

pip install -Uq fastai fastcore nbdev ohmeow-blurr datasets

import torch
from transformers import *
from fastai.text.all import *
from blurr.data.all import *
from blurr.modeling.all import *
from datasets import *

raw_datasets = load_dataset("amazon_reviews_multi", "en")
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

train_ds = raw_datasets['train']#.select(range(10000))
valid_ds = raw_datasets['validation']#.select(range(2000))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

dl_kwargs = {'bs': 4, 'val_bs': 8}
learn_kwargs = { 'metrics': [rmse], 'loss_func': MSELossFlat() }

learn = BlearnerForSequenceClassification.from_dictionaries(raw_ds, 'distilroberta-base',
text_attr='review_body', label_attr='stars',
dblock_splitter=IndexSplitter(valid_idxs),
dl_kwargs=dl_kwargs, learner_kwargs=learn_kwargs,
n_labels=1)
learn = learn.to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)

learn.fit_one_cycle(1, lr_max=2e-3)

learn.show_results(learner=learn, max_n=5)

ohmeow commented 2 years ago

That is because BlearnerForSequenceClassification is part of the high-level API designed for classification tasks (single or multilabel), not regression.

However, we can dip down into the mid-level API for a regression problem like this ...

model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"  # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=model_cls)

blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), RegressionBlock)
dblock = DataBlock(blocks=blocks, get_x=ItemGetter("review_body"), get_y=ItemGetter("stars"), splitter=RandomSplitter(seed=42))

dls = dblock.dataloaders(raw_ds, bs=4)

model = HF_BaseModelWrapper(hf_model)
learn = Learner(
    dls,
    model,
    opt_func=partial(OptimWrapper, opt=torch.optim.Adam),
    loss_func=MSELossFlat(),
    metrics=[rmse],
    cbs=[HF_BaseModelCallback],
    splitter=hf_splitter,
)

learn = learn.to_fp16()

learn.fit_one_cycle(1, lr_max=3e-5)

Give it a try and lmk how it goes. If you end up turning this into a blog post, it would be great to share with other folks who might be struggling with setting up blurr to work with regression. :)

DominikVogel commented 2 years ago

Thanks a lot for the help! I did not realize I need to use the mid-level API.

Your code almost works. However, it fails when I try to train the model (last line). I get the follwing error message:

RuntimeError: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 0

I tried to change the batch size to 8 but this results in a similar error:

RuntimeError: The size of tensor a (16) must match the size of tensor b (8) at non-singleton dimension 0

After studying your doc for some time I found the solution. The number of labels was not specified. The final code looks like this:

n_lbls = 1

model_cls = AutoModelForSequenceClassification

pretrained_model_name = "distilroberta-base"  # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, 
                                                                  model_cls=model_cls,
                                                                  config_kwargs={'num_labels': n_lbls})

blocks = (HF_TextBlock(hf_arch, hf_config, hf_tokenizer, hf_model), RegressionBlock)
dblock = DataBlock(blocks=blocks, get_x=ItemGetter("review_body"), get_y=ItemGetter("stars"), splitter=RandomSplitter(seed=42))

dls = dblock.dataloaders(raw_ds, bs=4)

model = HF_BaseModelWrapper(hf_model)
learn = Learner(
    dls,
    model,
    opt_func=partial(OptimWrapper, opt=torch.optim.Adam),
    loss_func=MSELossFlat(),
    metrics=[rmse],
    cbs=[HF_BaseModelCallback],
    splitter=hf_splitter,
)

learn = learn.to_fp16()

learn.fit_one_cycle(1, lr_max=3e-5)

grafik

Now I will try to understand what I did with the code and adapt it to my data. I will definetly write a blog post. That's the least I owe the community :-). Thanks a lot!

ohmeow commented 2 years ago

I'm going to close this out if all is good. lmk.

ohmeow / blurr

Regression task with blurr #58