Finetuning of LLaMA does not work in any setting (mem, lora)

sergsb commented 1 year ago

I try to fine-tune lmsys/vicuna-7b-v1.3 model. I have a server with 8 NVIDIA RTX A4500 (20Gb), so in total, about 160Gb of GPU Memory.

When I try to train with mem I have OOM in the middle of training. I followed the steps that have been described in the README, but it does not help much. That's strange because 160Gb of memory should be enough.

When I try to train LoRa with QLoRa and ZeRO2, I have another error. AssertionError: zero stage 2 requires an optimizer. Does anyone know how one can fix it?

When I try to train LoRa wits ZeRO3 I have


  File "/home/sergeys/miniconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2532, in all_gather_into_tensor
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: output tensor must have the same type as input tensor
    work = group._allgather_base(output_tensor, input_tensor)
           ^    ^    ^work = group._allgather_base(output_tensor, input_tensor)work = group._allgather_base(output_tensor, input_tensor)^

^^^^^^  ^  ^  ^   ^  ^  ^  ^  ^  ^  ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^RuntimeError^^: ^^output tensor must have the same type as input tensor^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

mvuthegoat commented 1 year ago

AssertionError: zero stage 2 requires an optimizer: You need to have transformers>=4.31.0 version to fix this

surak commented 9 months ago

What is the settings you use for mem? I'm trying the same and getting the same problem with 4x A100 40gb

lm-sys / FastChat

Finetuning of LLaMA does not work in any setting (mem, lora) #2117