/pytorch/xla/torch_xla/csrc/helpers.h:100 : Check failed: scalar_value.isIntegral()

mabdullah1994 commented 3 years ago

Environment info

transformers version: 4.4.2
Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
PyTorch version (GPU?): 1.8.0+cu101 (False)
Tensorflow version (GPU?):
Using GPU in script?: TPU
Using distributed or parallel set-up in script?:

Who can help

@patrickvonplaten @sgugger

Information

I am using LongformerForSequenceClassification and LongformerTokenizerFast for a simple text classification problem on Google Colab TPU:

The problem arises when using:

[ ] my own modified scripts: (Script shared) If I replace the LongformerForSequenceClassification model with the DistilBertForSequenceClassification model, the same code works perfectly fine and the training starts without any issues. However, with LongformerForSequenceClassification, I start getting weird errors on TPU.

from pathlib import Path

def read_imdb_split(split_dir):
    split_dir = Path(split_dir)
    texts = []
    labels = []
    for label_dir in ["pos", "neg"]:
        for text_file in (split_dir/label_dir).iterdir():
            texts.append(text_file.read_text())
            labels.append(0 if label_dir is "neg" else 1)

    return texts, labels

train_texts, train_labels = read_imdb_split('aclImdb/train')
test_texts, test_labels = read_imdb_split('aclImdb/test')

from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)

from transformers import DistilBertTokenizerFast, LongformerTokenizerFast
# tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 8)

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

import torch

class IMDbDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)

from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments, LongformerForSequenceClassification
import torch_xla.distributed.xla_multiprocessing as xmp
import torch_xla.core.xla_model as xm

def _mp_fn(index):
  training_args = TrainingArguments(
      output_dir='./results',          # output directory
      num_train_epochs=3,              # total number of training epochs
      per_device_train_batch_size=16,  # batch size per device during training
      per_device_eval_batch_size=64,   # batch size for evaluation
      warmup_steps=500,                # number of warmup steps for learning rate scheduler
      weight_decay=0.01,               # strength of weight decay
      logging_dir='./logs',            # directory for storing logs
      logging_steps=10,
  )

  # model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
  model = LongformerForSequenceClassification.from_pretrained("allenai/longformer-base-4096", attention_window = 2)

  trainer = Trainer(
      model=model,                         # the instantiated 🤗 Transformers model to be trained
      args=training_args,                  # training arguments, defined above
      train_dataset=train_dataset,         # training dataset
      eval_dataset=val_dataset             # evaluation dataset
  )

  trainer.train()

xmp.spawn(_mp_fn, args=(), nprocs=1, start_method='fork')

The tasks I am working on is:

[ ] my own task or dataset: Using the IMDB Dataset for Text Classification

To reproduce

Steps to reproduce the behavior:

Setup TPU-client on google Colab: !pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.8-cp37-cp37m-linux_x86_64.whl
Download the dataset: a. !wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz b. !tar -xf aclImdb_v1.tar.gz
Execute the given script

RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:100 : Check failed: scalar_value.isIntegral() 
*** Begin stack trace ***
    tensorflow::CurrentStackTrace()
    torch_xla::XlaHelpers::ScalarValue(c10::Scalar, xla::PrimitiveType, xla::XlaBuilder*)

    torch_xla::ir::ops::InferOutputShape(absl::lts_2020_02_25::Span<xla::Shape const>, std::function<xla::XlaOp (absl::lts_2020_02_25::Span<xla::XlaOp const>)> const&)

    torch_xla::ir::Node::GetOpShape(std::function<xla::Shape ()> const&) const
    torch_xla::ir::Node::Node(torch_xla::ir::OpKind, absl::lts_2020_02_25::Span<torch_xla::ir::Value const>, std::function<xla::Shape ()> const&, unsigned long, absl::lts_2020_02_25::uint128)
    torch_xla::ir::ops::ConstantPadNd::ConstantPadNd(torch_xla::ir::Value const&, std::vector<long, std::allocator<long> >, c10::Scalar)
    void __gnu_cxx::new_allocator<torch_xla::ir::ops::ConstantPadNd>::construct<torch_xla::ir::ops::ConstantPadNd, torch_xla::ir::Value, std::vector<long, std::allocator<long> >&, c10::Scalar&>(torch_xla::ir::ops::ConstantPadNd*, torch_xla::ir::Value&&, std::vector<long, std::allocator<long> >&, c10::Scalar&)
    torch_xla::XLATensor::constant_pad_nd(torch_xla::XLATensor const&, absl::lts_2020_02_25::Span<long const>, c10::Scalar)
    torch_xla::AtenXlaType::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<long>, c10::Scalar> >, at::Tensor (at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)>::call(c10::OperatorKernel*, at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)
    at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

    at::constant_pad_nd(at::Tensor const&, c10::ArrayRef<long>, c10::Scalar)

    _PyMethodDef_RawFastCallKeywords
    _PyCFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    PyEval_EvalCode

    _PyMethodDef_RawFastCallKeywords
    _PyCFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_FastCallDict

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyObject_Call_Prepend
    _PyObject_FastCallKeywords

    _PyMethodDef_RawFastCallDict
    PyCFunction_Call
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    PyEval_EvalCode

*** End stack trace ***
Scalar type not supported

Expected behavior

Model training should have started but instead got the error

sgugger commented 3 years ago

I don't think Longformer is supported on TPU, @patrickvonplaten will confirm.

mabdullah1994 commented 3 years ago

@sgugger Thanks! Looking forward to @patrickvonplaten confirmation.

patrickvonplaten commented 3 years ago

Hey @mabdullah1994, yeah Longformer is sadly not yet supported on TPU. We just merged Big Bird: https://huggingface.co/transformers/master/model_doc/bigbird.html though, which should work on TPU. It would be amazing if you could try it out :-)

mabdullah1994 commented 3 years ago

@patrickvonplaten Thanks for the update Patrick! Just a quick query: I have a dataset with large sequences and I don't want to truncate the text. What options do I have? Will XLNet be able to handle large sequences with pre-trained models? Could you point me towards an example of using stride for this use case? Thanks!

mabdullah1994 commented 3 years ago

Well, tried BigBird and getting a similar error on Google Colab

RuntimeError: torch_xla/csrc/tensor_methods.cpp:880 : Check failed: xla::ShapeUtil::Compatible(shapes.back(), tensor_shape) 
*** Begin stack trace ***
    tensorflow::CurrentStackTrace()
    torch_xla::XLATensor::cat(absl::lts_2020_02_25::Span<torch_xla::XLATensor const>, long)
    torch_xla::AtenXlaType::cat(c10::ArrayRef<at::Tensor>, long)
    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(c10::ArrayRef<at::Tensor>, long), at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<at::Tensor>, long> >, at::Tensor (c10::ArrayRef<at::Tensor>, long)>::call(c10::OperatorKernel*, c10::ArrayRef<at::Tensor>, long)

    at::cat(c10::ArrayRef<at::Tensor>, long)

    at::cat(c10::ArrayRef<at::Tensor>, long)

    _PyMethodDef_RawFastCallKeywords
    _PyCFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    _PyObject_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend

    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    PyEval_EvalCode

    _PyMethodDef_RawFastCallKeywords
    _PyCFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallDict
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyFunction_FastCallKeywords
    _PyEval_EvalFrameDefault
    _PyObject_Call_Prepend
    PyObject_Call
    _PyEval_EvalFrameDefault
    _PyEval_EvalCodeWithName
    _PyFunction_FastCallKeywords
*** End stack trace ***
s64[1,1,1]{2,1,0} vs. f32[1,1,1]{2,1,0}

patrickvonplaten commented 3 years ago

Hey @mabdullah1994,

Could you maybe open a new issue showcasing that big bird doesn't work on PyTorch/XLA? :-)

mabdullah1994 commented 3 years ago

Hey @patrickvonplaten

Just created a new issue #11363 with the details of the BigBird issue. Please advice. Thanks!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

aditya-malte commented 3 years ago

Any updates on this?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

KMFODA commented 2 years ago

hey @patrickvonplaten, with the release of the new trainer should this issue be resolved. I'm using the latest version of transformers and still getting this for models like allenai/led-base-16384 running on TPU.

huggingface / transformers