huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.25k stars 26.09k forks source link

Sources of randomness for Longformer #12482

Closed agarfau closed 2 years ago

agarfau commented 3 years ago

Environment info

Who can help

@patrickvonplaten

Information

Results when using Longformer for sequence classification are not consistent across runs after setting a random seed.

To reproduce

I include below a simple script derived from this example to reproduce the behavior.

When using bert-base-uncased, the generated results (and loss values) will be exactly the same across runs. However, when using allenai/longformer-base-4096 (just swap the comment for lines starting with MODEL_NAME), results (and loss values) will vary across runs. In this example, results happen to be very similar because of the simplicity of the problem, but I have experienced higher variability in cases with longer training schedules and larger and more 'complex' datasets. Still, I think this example suffices to illustrate the issue.

P.S. Using the commented set_seed function does not help.

import torch
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from transformers import Trainer, TrainingArguments, set_seed
from transformers import LongformerTokenizerFast, LongformerForSequenceClassification
from transformers import BertTokenizerFast, BertForSequenceClassification

import os
import random
import numpy as np
from typing import Optional

MODEL_NAME = 'bert-base-uncased'
# MODEL_NAME = 'allenai/longformer-base-4096'
SEED = 42

class IMDbDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

# def set_seed(seed: Optional[int]):
#     """ Set all seeds to make results reproducible (deterministic mode).
#          When seed is None, disables deterministic mode. """
#     if seed is not None:
#         torch.manual_seed(seed)
#         torch.cuda.manual_seed_all(seed)
#         torch.backends.cudnn.deterministic = True
#         torch.backends.cudnn.benchmark = False
#         np.random.seed(seed)
#         random.seed(seed)
#         os.environ['PYTHONHASHSEED'] = str(seed)

def main():
    set_seed(SEED)

    # Load IMDb
    train = load_dataset("imdb", split="train")[:50]
    test = load_dataset("imdb", split="test")[:10]

    train_texts = train['text']
    train_labels= train['label']

    test_texts = test['text']
    test_labels= test['label']

    # Split train into train and val
    train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2,
                                                                        random_state=SEED)

    # Load tokenizer
    if MODEL_NAME == 'bert-base-uncased':
        tokenizer = BertTokenizerFast.from_pretrained(MODEL_NAME)
    elif MODEL_NAME == 'allenai/longformer-base-4096':
        tokenizer = LongformerTokenizerFast.from_pretrained(MODEL_NAME)
    else:
        raise ValueError

    # Generate encodings
    train_encodings = tokenizer(train_texts, truncation=True, padding=True)
    val_encodings = tokenizer(val_texts, truncation=True, padding=True)
    test_encodings = tokenizer(test_texts, truncation=True, padding=True)

    # Create datasets
    train_dataset = IMDbDataset(train_encodings, train_labels)
    val_dataset = IMDbDataset(val_encodings, val_labels)
    test_dataset = IMDbDataset(test_encodings, test_labels)

    # Training
    training_args = TrainingArguments(
        output_dir='../tutorial_results',  # output directory
        num_train_epochs=5,  # total number of training epochs
        per_device_train_batch_size=1,  # batch size per device during training
        per_device_eval_batch_size=1,  # batch size for evaluation
        warmup_steps=500,  # number of warmup steps for learning rate scheduler
        weight_decay=0.01,  # strength of weight decay
        logging_dir='../tutorial_logs',  # directory for storing logs
        logging_steps=10,
        seed=SEED
    )

    if MODEL_NAME == 'bert-base-uncased':
        model = BertForSequenceClassification.from_pretrained(MODEL_NAME)
    elif MODEL_NAME == 'allenai/longformer-base-4096':
        model = LongformerForSequenceClassification.from_pretrained(MODEL_NAME)
    else:
        raise ValueError

    trainer = Trainer(
        model=model,  # the instantiated 🤗 Transformers model to be trained
        args=training_args,  # training arguments, defined above
        train_dataset=train_dataset,  # training dataset
        eval_dataset=val_dataset  # evaluation dataset
    )

    trainer.train()

    # Test set
    test_results = trainer.predict(test_dataset)
    print(test_results)

if __name__ == '__main__':
    main()

Expected behavior

I would expect that, having set a random seed, results would be the same across runs for longformers too. Is there any source of randomness for longformers not 'covered' by the set_seed function?

Thanks!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

DavidPfl commented 3 years ago

Hi, I am facing the same issues with reproducibility for the Longformer model for sequence classification. I posted my example in the Huggingface discussion forum here.

Setting the seeds as recommended here produces the exact same training loss in multiple training iterations (each time starting the finetuning from scratch) when using the roberta-base model, but not with allenai/longformer-base-4096.

patrickvonplaten commented 3 years ago

Hmm, that's interesting! Also pinging the original author @ibeltagy here. I don't see any weird functionality that's used in Longformer maybe except torch.Tensor.stride(...) that is in Longformer but not in other models like RoBERTa. Sadly I won't have the time to dive deep into this reproducible problem, but a good start to find the bug would be to verify that the same random model weights are loaded before training and then working with print(...) to see after what layer results start to differ

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 2 years ago

@abhishekkrthakur do you know where the randomness comes from?

hamishdickson commented 2 years ago

I have recently run into this issue. One of the things I've noticed is if you drop the max length down to 512 I start to get reproducible behaviour - not sure if that helps at all

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

djelassimalek commented 2 years ago

@abhishekkrthakur @patrickvonplaten I'm facing same issue. Any update on this behaviour ?

Environment info transformers Version: 4.11.0 Platform: Amazon Linux AMI 2018.03 torch Version: 1.9.0 Number GPU: 1

more information, when activating torch.use_deterministic_algorithms(True) during training. roberta-base and bert-base-uncased works fine. However with Longformer I get this error:

Loading features from cached file ./training-data/processed-data/text/toy/cached_train_longformer_512 LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Validation sanity check: 0it [00:00, ?it/s]Loading features from cached file ./training-data/processed-data/text/toy/cached_dev_longformer_512 Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]inputs={'input_ids': tensor([[ 0, 7202, 3063, ..., 1, 1, 1], [ 0, 1301, 1723, ..., 44828, 15555, 2], [ 0, 27201, 1000, ..., 3706, 6, 2], ..., [ 0, 41188, 2444, ..., 1, 1, 1], [ 0, 3384, 591, ..., 1, 1, 1], [ 0, 495, 2492, ..., 700, 28607, 2]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 1, 1, 1]], device='cuda:0'), 'labels': tensor([1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1], device='cuda:0')} Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/venv-transformer/bin/nlpsaf-transformer-train", line 33, in sys.exit(load_entry_point('nlpsaf-transformer', 'console_scripts', 'nlpsaf-transformer-train')()) File "/home/ec2-user/SageMaker/nlpsaf-transformers/nlpsaf_transformer/run_safetysignal.py", line 31, in main = train(args) File "/home/ec2-user/SageMaker/nlpsaf-transformers/nlpsaf_transformer/run_safety_signal.py", line 17, in train generic_train(model, args) File "/home/ec2-user/SageMaker/nlpsaf-transformers/nlpsaf_transformer/model.py", line 462, in generic_train trainer.fit(model) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit self._run(model) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run self._dispatch() File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch self.accelerator.start_training(self) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training self.training_type_plugin.start_training(trainer) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training self._results = trainer.run_stage() File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage return self._run_train() File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_train self._run_sanity_check(self.lightning_module) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1122, in _run_sanity_check self._evaluation_loop.run() File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance dl_outputs = self.epoch_loop.run( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 111, in advance output = self.evaluation_step(batch, batch_idx, dataloader_idx) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 158, in evaluation_step output = self.trainer.accelerator.validation_step(step_kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 211, in validation_step return self.training_type_plugin.validation_step(step_kwargs.values()) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 392, in validation_step return self.model(args, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], kwargs[0]) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 93, in forward output = self.module.validation_step(inputs, kwargs) File "/home/ec2-user/SageMaker/nlpsaf-transformers/nlpsaf_transformer/model.py", line 345, in validation_step outputs = self(inputs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/ec2-user/SageMaker/nlpsaf-transformers/nlpsaf_transformer/model.py", line 179, in forward return self.model(inputs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1858, in forward outputs = self.longformer( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1677, in forward encoder_outputs = self.encoder( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1280, in forward layer_outputs = layer_module( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1205, in forward self_attn_outputs = self.attention( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 1141, in forward self_outputs = self.self( File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, *kwargs) File "/home/ec2-user/anaconda3/envs/venv-transformer/lib/python3.8/site-packages/transformers/models/longformer/modeling_longformer.py", line 708, in forward attn_probs[is_index_global_attn_nonzero] = 0 RuntimeError: linearIndex.numel()sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":253, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor1973761

JohnGiorgi commented 1 year ago

Facing the same issue with allenai/led-large-16384 via the run_summarization.py script. Both seed and data_seed are set, and there are no randomly initialized weights:

[INFO|modeling_utils.py:3032] 2023-03-30 10:44:26,097 >> All model checkpoint weights were used when initializing LEDForConditionalGeneration.
[INFO|modeling_utils.py:3040] 2023-03-30 10:44:26,098 >> All the weights of LEDForConditionalGeneration were initialized from the model checkpoint at allenai/led-large-16384.

The same script with another model (e.g. flan-t5-large) is perfectly reproducible across runs with the same seed. I don't have a sense of where the remaining sources of randomness would be for Longformer/LED.