FineTuning issue with Gemma-2B-IT model using the SFTTrainer

AvisP commented 1 month ago

Based on the example scripts, I wrote a small code to test out if training is working properly on a new dataset alpaca-cleaned. It works properly with the facebook opt-350m but doesn't when I am using the google/gemma-2b-it model. Any help in figuring out what might be the issue would be appreciated.

from datasets import load_dataset
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
import transformers
from transformers import TrainingArguments
import subprocess
#HF_READ = 
subprocess.run(["huggingface-cli", "login", "--token", HF_READ])

dataset = load_dataset("yahma/alpaca-cleaned", split="train")

model_id = "facebook/opt-350m"
# model_id = "google/gemma-2b-it"

model = transformers.AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = 'right'
# tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_special_tokens({'pad_token':'[PAD]'})
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = dataset.map(formatting_prompts_func, batched=True)

instruction_template = "### Instruction:"
response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(instruction_template=instruction_template, response_template=response_template, tokenizer=tokenizer, mlm=False)

train_args = TrainingArguments(
        num_train_epochs=1,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
#         max_steps = 60, ## Turn it on while testing pipeline
        learning_rate = 2e-4,
        fp16 = True,
        bf16 = False,
        logging_steps = 1,
        optim = "adamw_torch", #"adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    )

trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    # formatting_func=formatting_prompts_func,
    dataset_text_field= "text",
    data_collator=collator,
    max_seq_length=1024,
    packing=False,
    args=train_args,
)

trainer.train()

The error I am getting with the gemma-2b-it model is

Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.08s/it] 
  0%|                                                                                                                                                                                                 | 0/6470 [00:00<?, ?it/s]C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [14,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [80,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [16,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1290: block: [17,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "C:\Projects\FineTune-Gemma2B\Gemma.py", line 82, in <module>
    trainer.train()
  File "C:\Projects\venvs\Lib\site-packages\trl\trainer\sft_trainer.py", line 361, in train
    output = super().train(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 1859, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 2203, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3138, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3161, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 817, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 805, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 1121, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 887, in forward
    cache_position = torch.arange(
                     ^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

  0%|          | 0/6470 [00:00<?, ?it/s]

younesbelkada commented 1 month ago

Hi @AvisP Hmm, this seems to indicate that there is an indexing error, can you try to run on single training step on CPU and past the error traceback here?

AvisP commented 1 month ago

@younesbelkada So I set use_cpu=True and bf16=True in TrainingArguments and this is the error message I got

Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.32s/it]
  0%|                                                                                                                                        | 0/6470 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\Projects\FineTune-Gemma2B\Gemma.py", line 86, in <module>
    trainer.train()
  File "C:\Projects\venvs\Lib\site-packages\trl\trainer\sft_trainer.py", line 361, in train
    output = super().train(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 1859, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 2203, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3138, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\trainer.py", line 3161, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 817, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\accelerate\utils\operations.py", line 805, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 1121, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 878, in forward
    inputs_embeds = self.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "C:\Projects\venvs\Lib\site-packages\torch\nn\functional.py", line 2237, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
  0%|          | 0/6470 [00:00<?, ?it/s]

younesbelkada commented 4 weeks ago

Thanks @AvisP I think what is happening is that the new token [PAD] you are adding is out of the vocabulary (you might have received a warning saying that you added special tokens in the tokenizer). you'll either need to extend the vocab size of the model or use the native padd token of that model. For Gemma-2B, the padd token already exists: https://huggingface.co/google/gemma-2b/blob/main/tokenizer_config.json#L6 so you don't need to call tokenizer.add_special_tokens({'pad_token':'[PAD]'})

github-actions[bot] commented 4 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

huggingface / trl

FineTuning issue with Gemma-2B-IT model using the SFTTrainer #1665