PEFT + ZeRO Phase 2 + Transformers doesn't output pytorch_model.bin

cdoern commented 5 months ago

System Info

Python 3.11.9 Transformers 4.38.2 torch 2.3.0

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

All of the below steps occur when creating a trainer using a python script and running trainer.train()

load a bitsnbytes 4 bit quantized, pretrained model from HF.
put it on GPU (single A10 in this case)
create a trainer with peft config using lora, and init deepspeed with phase 2
trainer.train

output dir looks like:

checkpoint-1  checkpoint-2  checkpoint-3  checkpoint-4  checkpoint-5

one of the checkpoint dirs has:

adapter_config.json        global_step2  rng_state.pth            tokenizer.json      training_args.bin
adapter_model.safetensors  latest        special_tokens_map.json  tokenizer.model     zero_to_fp32.py
added_tokens.json          README.md     tokenizer_config.json    trainer_state.json

Expected behavior

I believe there should be pytorch_model.bin for me to load. If LorA is somehow messing with where everything gets stored, this isn't documented anywhere. Training succeeds, but I am not merging or unloading anything as I am not sure where to grab the proper weights from since the info in the docs doesn't cover this.

amyeroberts commented 5 months ago

cc @younesbelkada

cdoern commented 5 months ago

Just looking for clarity on if this is normal and if so how to get the trained model in some sane format I can carry on with.

cdoern commented 5 months ago

@younesbelkada sorry for the ping, but this is a pretty big blocker for some work I am doing. Is there a quick howto I can get on PEFT+Deepspeed if my understanding is lacking here?

younesbelkada commented 5 months ago

Hi @cdoern Apologies for my late reply, in the latest PEFT / transformers we should support QLoRA + DeepSpeed on all stages: https://huggingface.co/docs/peft/accelerate/deepspeed see that documentation page for more details You are using QLoRA meaning you are fine-tuning adapters on top of the full model (i.e. base model), it is expected that you only have adapter_model.safetensors file. Note you can load directly the adapters with AutoModelForCausalLM.from_pretrained (see https://huggingface.co/docs/transformers/peft for reference), or you can use PeftModel.from_pretrained from peft and call model.merge_and_unload() to merge the adapter model into the base model and save the merged model as a standalone model.

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers