ohmeow / blurr

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.
https://ohmeow.github.io/blurr
Apache License 2.0
289 stars 34 forks source link

RuntimeError: Could not infer dtype of NoneType #62

Closed lkarjun closed 2 years ago

lkarjun commented 2 years ago

Hi Ohmeow

Yesterday, I'm recreating the example 02_modeling-language-modeling.ipynb with BERT arch. I got RuntimeError when I checked the block.summary() Copy of my notebook. Hope you'll solve this issue as soon as possible. Actually, I'm planning to create a Malayalam language model and fine-tune it for classification problems for my final year project. If you fix this issue as soon as possible, it will be great help... Thanks 🤗

Data Block

model_cls = AutoModelForCausalLM

pretrained_model_name = "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=model_cls)

if (hf_tokenizer.pad_token is None): hf_tokenizer.pad_token = '[PAD]'
bbtfm = HF_LMBeforeBatchTransform(hf_arch, hf_config, hf_tokenizer, hf_model, lm_strategy_cls=CausalLMStrategy)
blocks = (HF_TextBlock(before_batch_tfm=bbtfm, input_return_type=HF_CausalLMInput
                       ,), noop)

dblock = DataBlock(blocks=blocks, get_x=ColReader(0), splitter=ColSplitter(col='is_valid'))
dblock.summary(df[:5])

Error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-61-87db21311c23> in <module>()
----> 1 dblock.summary(df[:5])

9 frames
/usr/local/lib/python3.7/dist-packages/fastai/torch_core.py in tensor(x, *rest, **kwargs)
    129     # if isinstance(x, (tuple,list)) and len(x)==0: return tensor(0)
    130     res = (x if isinstance(x, Tensor)
--> 131            else torch.tensor(x, **kwargs) if isinstance(x, (tuple,list))
    132            else _array2tensor(x) if isinstance(x, ndarray)
    133            else as_tensor(x.values, **kwargs) if isinstance(x, (pd.Series, pd.DataFrame))

RuntimeError: Could not infer dtype of NoneType