rmihaylov / falcontune

Tune any FALCON in 4-bit
Apache License 2.0
468 stars 51 forks source link

Does fine-tuning support the multi-GPU training? #20

Open cahuja1992 opened 1 year ago

cahuja1992 commented 1 year ago

Does fine-tuning support the multi-GPU training?

When trying to fine-tune with multiple GPUs, got the following error.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 1070, in forward
    transformer_outputs = self.transformer(
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 965, in forward
    outputs = block(
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/mpt/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 720, in forward
    mlp_output += attention_output
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
SoumitriKolavennu commented 1 year ago

Same issue for me as well.

gpravi commented 1 year ago

Am just doing with single gpu for now but running into OOM issue - https://github.com/rmihaylov/falcontune/issues/19

TeaCult commented 1 year ago

I have the same issue , device_map=auto does not work for training. I guess we should copy tokenizer to each and all devices ? Why does device_map=auto can not handle this ?

zepmck commented 1 year ago

Same here. While training on multi-GPU I get the following error:

ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cuda', index=0), device(type='cuda', index=1)}.

baptistejamin commented 1 year ago

It seems it doesn't rely on Accelerate framework.

An easy quick win would be to rely on the new HuggingFace SFT Trainer instead: https://huggingface.co/docs/trl/sft_trainer