Not able to load peft (promt-tuned) model in multi-gpu settings for inference

dineshkh commented 6 months ago

I have prompt tuned the Falcon-7B-Instruct model. Now, I want to perform inference using prompt tuned model in multi-gpu settings using accelerate. I am using 2 A100 gpus and batch size of 1 on each gpu.

my code:

`- `Accelerate` version: 0.25.0
- Platform: Linux-4.18.0-477.15.1.el8_8.x86_64-x86_64-with-glibc2.28
- Python version: 3.11.2
- Numpy version: 1.24.1
- PyTorch version (GPU?): 2.1.2+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 2003.40 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
    Not found`

I am getting following error:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.17s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.17s/it]
Traceback (most recent call last):
  File "/dccstor/dinesh/distillation_prompt_tuning/slot_filling_inference_accelerate.py", line 121, in <module>
    app.run(main)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/dccstor/dinesh/distillation_prompt_tuning/slot_filling_inference_accelerate.py", line 51, in main
    model = PeftModel.from_pretrained(model, FLAGS.ckpt_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/peft/peft_model.py", line 352, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/peft/peft_model.py", line 689, in load_adapter
    adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 270, in load_peft_weights
    adapters_weights = safe_load_file(filename, device=device)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/safetensors/torch.py", line 310, in load_file
    result[k] = f.get_tensor(k)
                ^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

{'ckpt_path': '/dccstor/dinesh/prompt_tuning/falcon-7b-instruct/20240117-071924/slot_filling_tiiuae/falcon-7b-instruct_ckpt_step_4000', 'dataset_path': '/dccstor/dinesh/distillation_prompt_tuning/my_data_new/full_data_15Dec23/slot_filling_data_nomal_slot_lib', 'output_file_path': '/dccstor/dinesh/prompt_tuning/test_output_4000.jsonl', 'max_new_tokens': 200}
[2024-01-21 05:35:10,311] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 1955936 closing signal SIGTERM
[2024-01-21 05:35:10,521] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 1955937) of binary: /dccstor/dinesh/conda_environments/kd_promt/bin/python3.1
Traceback (most recent call last):
  File "/dccstor/dinesh/conda_environments/kd_promt/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1008, in launch_command
    multi_gpu_launcher(args)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/accelerate/commands/launch.py", line 666, in multi_gpu_launcher
    distrib_run.run(args)
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dccstor/dinesh/conda_environments/kd_promt/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

My configuration:

`- `Accelerate` version: 0.25.0
- Platform: Linux-4.18.0-477.15.1.el8_8.x86_64-x86_64-with-glibc2.28
- Python version: 3.11.2
- Numpy version: 1.24.1
- PyTorch version (GPU?): 2.1.2+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 2003.40 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
    Not found`

BenjaminBossan commented 5 months ago

Thanks for reporting the error. At first glance, it doesn't seem related to PEFT. Could you please confirm if loading the base model, without any PEFT adapter, works in the same scenario (ideally also using safetensors), so that we can be sure that it's PEFT specifically that causes the error?

dineshkh commented 5 months ago

@BenjaminBossan sorry for late reply. Yes, loading the base model (tiiuae/falcon-7b-instruct) without any PEFT adapter works in the same scenario. I have loaded the base model using accelerate on 2 GPUs. The falcon-7b-instruct model is in .bin format, not in the safetensors format. Therefore, I tried loading mistralai/Mistral-7B-v0.1 which is in safetensors format and that also worked.

BenjaminBossan commented 5 months ago

Hmm, this is really strange, the error message seems unrelated to PEFT. Could you reproduce the error consistently? Could you try restarting the machine? Also, could you please monitor the memory and ensure that you're not running OOM while loading the PEFT model?

dineshkh commented 5 months ago

@BenjaminBossan I am able to produce the error consistently. I am not getting OOM, I using 2 A100-80GB GPUs.

here is the minimal reproducible example.

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
import torch
from absl import flags
from absl import app
from accelerate import Accelerator
from accelerate.utils import gather_object

device = "cuda"
flags.DEFINE_string('ckpt_path', None, help='checkpoint path')

def main(_):
    FLAGS = flags.FLAGS
    accelerator = Accelerator()
    # each GPU creates a string
    message = [f"Hello this is GPU {accelerator.process_index}"]
    # collect the messages from all GPUs
    messages = gather_object(message)

    # output the messages only on the main process with accelerator.print()
    accelerator.print(messages)

    config = PeftConfig.from_pretrained(FLAGS.ckpt_path)
    model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2")
    model = PeftModel.from_pretrained(model, FLAGS.ckpt_path, is_trainable=False)
    model = model.to(accelerator.device)

if __name__ == '__main__':
    app.run(main)

Here is the checkpoint: falcon-7b-instruct_ckpt_step_4000.zip

launch command:

accelerate launch slot_filling_inference_accelerate_demo.py --ckpt_path falcon-7b-instruct_ckpt_step_4000

accelerate                0.25.0                   pypi_0    pypi
peft                      0.7.1                    pypi_0    pypi
python                    3.11.2               h7a1cb2a_0
safetensors               0.4.1                    pypi_0    pypi
tokenizers                0.15.0                   pypi_0    pypi
torch                     2.1.2+cu118              pypi_0    pypi
torchvision               0.16.2+cu118             pypi_0    pypi
transformers              4.36.2                   pypi_0    pypi

BenjaminBossan commented 5 months ago

Oh, I see that you already opened an issue here: https://github.com/huggingface/accelerate/issues/2360. Let's keep the replies there.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

huggingface / peft

Not able to load peft (promt-tuned) model in multi-gpu settings for inference #1379