Fine-tuning NLLB model in multi-gpu makes RuntimeError

sorryhyun commented 1 year ago

I tried to fine-tune NLLB model on my custom dataset on multi-gpu environment, and it makes following error.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

It was ok on single GPU. Is there anything I need to modify code for multi-gpu?

I attempted to run following code:

def train_lora(weight_path,save_path,lr):
    tokenizer = NllbTokenizerFast.from_pretrained(
        "facebook/nllb-200-distilled-1.3B", src_lang="kor_Hang", tgt_lang="eng_Latn"
    )
    model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B")
    peft_config = LoraConfig(
        target_modules=['q_proj','v_proj'], task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
    )
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()

    from datasets import Dataset
    import json
    with open('./kor-eng_news.json','r')as f:
        a = json.load(f)

    dataset = Dataset.from_dict(a)
    dataset = dataset.train_test_split(test_size=0.015, shuffle=True)

    def preprocess_function(examples):
        model_inputs = tokenizer(examples['korean'], text_target=examples['english'], max_length=200, truncation=True)
        return model_inputs

    data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
    tokenized_datasets = dataset.map(preprocess_function, batched=True)

    import numpy as np
    import evaluate
    metric = evaluate.load("sacrebleu")

    def postprocess_text(preds, labels):
        preds = [pred.strip() for pred in preds]
        labels = [[label.strip()] for label in labels]
        return preds, labels

    def compute_metrics(eval_preds):
        preds, labels = eval_preds
        if isinstance(preds, tuple):
            preds = preds[0]
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

        result = metric.compute(predictions=decoded_preds, references=decoded_labels)
        result = {"bleu": result["score"]}

        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        result["gen_len"] = np.mean(prediction_lens)
        result = {k: round(v, 4) for k, v in result.items()}
        return result

    training_args = Seq2SeqTrainingArguments(
        output_dir=weight_path,
        evaluation_strategy="steps",
        save_strategy='steps',
        logging_steps=5000,
        eval_steps=25000,
        save_steps=25000,
        learning_rate=lr,
        per_device_train_batch_size=6,
        per_device_eval_batch_size=18,
        weight_decay=0.01,
        num_train_epochs=2,
        predict_with_generate=True,
        load_best_model_at_end=True,
        save_total_limit=1,
        metric_for_best_model='eval_bleu',
    )

    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_datasets["train"],
        eval_dataset=tokenized_datasets["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

    print(trainer.evaluate())
    trainer.train()
    trainer.model.save_pretrained(save_path)

younesbelkada commented 1 year ago

Hi @comchobo Thanks for the issue! I think you need to load your model with accelerate:

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-1.3B", device_map="auto")

And also make sure to use the main branch of transformers as it has a fix for Trainer + multi-gpu: https://github.com/huggingface/transformers/pull/22532

younesbelkada commented 1 year ago

To use the main branch of transformers:

pip install git+https://github.com/huggingface/transformers.git

sorryhyun commented 1 year ago

I followed the instructions but still getting error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

I ran this code with following command:

CUDA_VISIBLE_DEVICES=0,1,2 python lora_training.py

younesbelkada commented 1 year ago

@comchobo can you share the full traceback of the error?

sorryhyun commented 1 year ago

Traceback (most recent call last):
  File "/home/sorryhyun/finetune_nllb_with_transformer/lora_training.py", line 96, in <module>
    train_lora('lora_training_nllb_1p3B_lr=5e-4','lora_training_nllb_1p3B_lr=5e-4_saved',lr=5e-4)
  File "/home/sorryhyun/finetune_nllb_with_transformer/lora_training.py", line 92, in train_lora
    trainer.train()
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/peft/peft_model.py", line 667, in forward
    return self.base_model(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1335, in forward
    outputs = self.model(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1208, in forward
    encoder_outputs = self.encoder(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 837, in forward
    layer_outputs = encoder_layer(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 390, in forward
    hidden_states, attn_weights, _ = self.self_attn(
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 249, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/peft/tuners/lora.py", line 350, in forward
    result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sorryhyun/anaconda3/envs/sorryhyun/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

younesbelkada commented 1 year ago

What is your peft version?

sorryhyun commented 1 year ago

It is peft==0.2.0

younesbelkada commented 1 year ago

You need to install peft from source, as you need to have this fix: https://github.com/huggingface/peft/pull/145 Please uninstall peft and re-install it with the following command:

pip install git+https://github.com/huggingface/peft

sorryhyun commented 1 year ago

Thanks! It's now working @younesbelkada

huggingface / peft

Fine-tuning NLLB model in multi-gpu makes RuntimeError #265