Unable to train the llama-7b in a machine with two Tesla T4 GPU's using DeepSpeed integration #3784

Open Ragul-Ramdass opened 9 months ago

Ragul-Ramdass commented 9 months ago

Hi I'm trying to do a distributed training on llama-7b in a VM having two Tesla T4 GPU's using native deepspeed. I'm facing the following error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

My current OS is ubuntu :20.04 python version: 3.10.13 model.yaml:

base_model: /root/CodeLlama-7b-Python-hf

  bits: 4

  type: lora

  template: |
    ### Instruction:

    ### Context:

    ### Input:

    ### Response:

  - name: prompt
    type: text
      max_sequence_length: 2048

  - name: Response
    type: text
      max_sequence_length: 2048

  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  max_batch_size: 1
  gradient_accumulation_steps: 1
  enable_gradient_checkpointing: true
  epochs: 3
    warmup_fraction: 0.01

  sample_ratio: 1.0

  type: ray
    use_gpu: true
    strategy: deepspeed


Can you guide me in solving this Thanks in advance!!

alexsherstinsky commented 9 months ago

Hi @Ragul-Ramdass -- thank you for reporting this issue and the one in #3783 -- please give us a few business days to look into it and get back to you (I left a similar message in the above mentioned issue as well). Thank you.

Ragul-Ramdass commented 9 months ago

Hi @alexsherstinsky, Thanks for looking into it, Please let me know if you need any other information. My aim is achieving distributed training using deepspeed in ludwig, if you can suggest any work around that would also be great. Thanks