Unable to train the llama-7b in a machine with two Tesla T4 GPU's using DeepSpeed integration

Ragul-Ramdass commented 12 months ago

Hi I'm trying to do a distributed training on llama-7b in a VM having two Tesla T4 GPU's using native deepspeed. I'm facing the following error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

My current OS is ubuntu :20.04 python version: 3.10.13 model.yaml:

base_model: /root/CodeLlama-7b-Python-hf

quantization:
  bits: 4

adapter:
  type: lora

prompt:
  template: |
    ### Instruction:
    {Instruction}

    ### Context:
    {Context}

    ### Input:
    {Input}

    ### Response:

input_features:
  - name: prompt
    type: text
    preprocessing:
      max_sequence_length: 2048

output_features:
  - name: Response
    type: text
    preprocessing:
      max_sequence_length: 2048

trainer:
  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  max_batch_size: 1
  gradient_accumulation_steps: 1
  enable_gradient_checkpointing: true
  epochs: 3
  learning_rate_scheduler:
    warmup_fraction: 0.01

preprocessing:
  sample_ratio: 1.0

backend:
  type: ray
  trainer:
    use_gpu: true
    strategy: deepspeed

Environment:

absl-py                       2.0.0
accelerate                    0.24.1
aiohttp                       3.8.6
aiosignal                     1.3.1
asttokens                     2.4.1
async-timeout                 4.0.3
attrs                         23.1.0
backports.functools-lru-cache 1.6.5
bitsandbytes                  0.40.2
bleach                        6.1.0
blessed                       1.20.0
blis                          0.7.11
cachetools                    5.3.2
catalogue                     2.0.10
certifi                       2023.7.22
charset-normalizer            3.3.2
click                         8.1.7
cloudpathlib                  0.16.0
comm                          0.1.4
commonmark                    0.9.1
confection                    0.1.3
cymem                         2.0.8
Cython                        3.0.5
dataclasses-json              0.6.2
datasets                      2.15.0
debugpy                       1.6.7
decorator                     5.1.1
deepspeed                     0.12.3
dill                          0.3.7
distlib                       0.3.7
entrypoints                   0.4
et-xmlfile                    1.1.0
exceptiongroup                1.1.3
executing                     2.0.1
filelock                      3.13.1
frozenlist                    1.4.0
fsspec                        2023.9.2
getdaft                       0.1.20
google-auth                   2.23.4
google-auth-oauthlib          1.1.0
gpustat                       1.1.1
grpcio                        1.59.2
h5py                          3.10.0
hjson                         3.1.0
html5lib                      1.1
huggingface-hub               0.19.4
idna                          3.4
ipykernel                     6.26.0
ipython                       8.17.2
jedi                          0.19.1
Jinja2                        3.1.2
joblib                        1.3.2
jsonschema                    4.6.2
jupyter-client                7.3.4
jupyter_core                  5.5.0
kaggle                        1.5.16
langcodes                     3.3.0
lightning-utilities           0.9.0
loguru                        0.7.2
ludwig                        0.9.dev0
lxml                          4.9.3
Markdown                      3.5.1
MarkupSafe                    2.1.3
marshmallow                   3.20.1
marshmallow-dataclass         8.5.4
marshmallow-jsonschema        0.13.0
matplotlib-inline             0.1.6
mpi4py                        3.1.4
mpmath                        1.3.0
msgpack                       1.0.7
multidict                     6.0.4
multiprocess                  0.70.15
murmurhash                    1.0.10
mypy-extensions               1.0.0
nest-asyncio                  1.5.8
networkx                      3.2.1
ninja                         1.11.1.1
nltk                          3.8.1
numpy                         1.26.2
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-ml-py                  12.535.133
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.3.101
nvidia-nvtx-cu12              12.1.105
oauthlib                      3.2.2
openpyxl                      3.1.2
packaging                     23.2
pandas                        2.1.3
parso                         0.8.3
peft                          0.6.2
pexpect                       4.8.0
pickleshare                   0.7.5
Pillow                        10.1.0
pip                           23.3
platformdirs                  3.11.0
preshed                       3.0.9
prompt-toolkit                3.0.41
protobuf                      3.20.3
psutil                        5.9.0
ptyprocess                    0.7.0
pure-eval                     0.2.2
py                            1.11.0
py-cpuinfo                    9.0.0
pyarrow                       14.0.1
pyarrow-hotfix                0.5
pyasn1                        0.5.0
pyasn1-modules                0.3.0
pydantic                      1.10.13
Pygments                      2.16.1
pynvml                        11.5.0
pyrsistent                    0.20.0
python-dateutil               2.8.2
python-slugify                8.0.1
pytz                          2023.3.post1
pyxlsb                        1.0.10
PyYAML                        6.0
pyzmq                         25.1.0
ray                           2.3.1
regex                         2023.10.3
requests                      2.31.0
requests-oauthlib             1.3.1
retry                         0.9.2
rich                          12.4.4
rsa                           4.9
sacremoses                    0.1.1
safetensors                   0.4.0
scikit-learn                  1.3.2
scipy                         1.11.3
sentencepiece                 0.1.99
setuptools                    68.0.0
six                           1.16.0
smart-open                    6.4.0
spacy                         3.7.2
spacy-legacy                  3.0.12
spacy-loggers                 1.0.5
srsly                         2.4.8
stack-data                    0.6.2
sympy                         1.12
tabulate                      0.9.0
tensorboard                   2.15.1
tensorboard-data-server       0.7.2
text-unidecode                1.3
thinc                         8.2.1
threadpoolctl                 3.2.0
tokenizers                    0.15.0
torch                         2.1.1
torchaudio                    2.1.1
torchdata                     0.7.1
torchinfo                     1.8.0
torchmetrics                  1.2.0
torchtext                     0.16.1
torchvision                   0.16.1
tornado                       6.1
tqdm                          4.66.1
traitlets                     5.13.0
transformers                  4.35.2
triton                        2.1.0
typer                         0.9.0
typing_extensions             4.8.0
typing-inspect                0.9.0
tzdata                        2023.3
urllib3                       2.1.0
virtualenv                    20.24.6
wasabi                        1.1.2
wcwidth                       0.2.10
weasel                        0.3.4
webencodings                  0.5.1
Werkzeug                      3.0.1
wheel                         0.41.2
xlrd                          2.0.1
XlsxWriter                    3.1.9
xlwt                          1.3.0
xxhash                        3.4.1
yarl                          1.9.2

Can you guide me in solving this Thanks in advance!!

alexsherstinsky commented 12 months ago

Hi @Ragul-Ramdass -- thank you for reporting this issue and the one in #3783 -- please give us a few business days to look into it and get back to you (I left a similar message in the above mentioned issue as well). Thank you.

Ragul-Ramdass commented 12 months ago

Hi @alexsherstinsky, Thanks for looking into it, Please let me know if you need any other information. My aim is achieving distributed training using deepspeed in ludwig, if you can suggest any work around that would also be great. Thanks

ludwig-ai / ludwig

Unable to train the llama-7b in a machine with two Tesla T4 GPU's using DeepSpeed integration #3784