Closed dineshkh closed 4 months ago
Thanks for reporting the error. At first glance, it doesn't seem related to PEFT. Could you please confirm if loading the base model, without any PEFT adapter, works in the same scenario (ideally also using safetensors), so that we can be sure that it's PEFT specifically that causes the error?
@BenjaminBossan sorry for late reply. Yes, loading the base model (tiiuae/falcon-7b-instruct
) without any PEFT adapter works in the same scenario. I have loaded the base model using accelerate on 2 GPUs. The falcon-7b-instruct
model is in .bin
format, not in the safetensors
format. Therefore, I tried loading mistralai/Mistral-7B-v0.1
which is in safetensors
format and that also worked.
Hmm, this is really strange, the error message seems unrelated to PEFT. Could you reproduce the error consistently? Could you try restarting the machine? Also, could you please monitor the memory and ensure that you're not running OOM while loading the PEFT model?
@BenjaminBossan I am able to produce the error consistently. I am not getting OOM, I using 2 A100-80GB GPUs.
here is the minimal reproducible example.
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
import torch
from absl import flags
from absl import app
from accelerate import Accelerator
from accelerate.utils import gather_object
device = "cuda"
flags.DEFINE_string('ckpt_path', None, help='checkpoint path')
def main(_):
FLAGS = flags.FLAGS
accelerator = Accelerator()
# each GPU creates a string
message = [f"Hello this is GPU {accelerator.process_index}"]
# collect the messages from all GPUs
messages = gather_object(message)
# output the messages only on the main process with accelerator.print()
accelerator.print(messages)
config = PeftConfig.from_pretrained(FLAGS.ckpt_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2")
model = PeftModel.from_pretrained(model, FLAGS.ckpt_path, is_trainable=False)
model = model.to(accelerator.device)
if __name__ == '__main__':
app.run(main)
Here is the checkpoint: falcon-7b-instruct_ckpt_step_4000.zip
launch command:
accelerate launch slot_filling_inference_accelerate_demo.py --ckpt_path falcon-7b-instruct_ckpt_step_4000
accelerate 0.25.0 pypi_0 pypi
peft 0.7.1 pypi_0 pypi
python 3.11.2 h7a1cb2a_0
safetensors 0.4.1 pypi_0 pypi
tokenizers 0.15.0 pypi_0 pypi
torch 2.1.2+cu118 pypi_0 pypi
torchvision 0.16.2+cu118 pypi_0 pypi
transformers 4.36.2 pypi_0 pypi
Oh, I see that you already opened an issue here: https://github.com/huggingface/accelerate/issues/2360. Let's keep the replies there.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
I have prompt tuned the
Falcon-7B-Instruct model
. Now, I want to perform inference using prompt tuned model in multi-gpu settings usingaccelerate
. I am using 2 A100 gpus and batch size of 1 on each gpu.my code:
I am getting following error:
My configuration: