huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.54k stars 393 forks source link

wierd conversation with zephyr-7b-dpo-lora #78

Open njupopsicle opened 9 months ago

njupopsicle commented 9 months ago

VX0N3YSQ@~VU@RPDR5@$KWE When chatting with zephyr-7b-dpo-lora, as shown in the fig above , only the first 'Hello' was I sent, all the following content are generated by zephyr, including the \<user> prompt. I cannot figure out why.

DRXD1000 commented 9 months ago

Try loading the sft adapter first. Then merge the adapter into the base model and than load the dpo adapter. U can use the following code:

model_name = "alignment-handbook/zephyr-7b-sft-lora" tokenizer = AutoTokenizer.from_pretrained("alignment-handbook/zephyr-7b-sft-lora") model = AutoPeftModelForCausalLM.from_pretrained( model_name, device_map="auto", use_flash_attention_2=True, torch_dtype = torch.bfloat16, use_cache=True ) print("Merging Model") model = model.merge_and_unload() print("Model Merged")

peft_config = PeftConfig.from_pretrained("alignment-handbook/zephyr-7b-dpo-lora")

model = PeftModel.from_pretrained(model, "alignment-handbook/zephyr-7b-dpo-lora")

5thGenDev commented 6 months ago

Try loading the sft adapter first. Then merge the adapter into the base model and than load the dpo adapter. U can use the following code:

Where do I put these code?