用LoRA微调Baichuan-13B-Chat后，代码测试无效

大佬好，我是一名刚入门的AI工程师，能够使用OpenAI提供的fine-tuning功能来微调，目前正在学习微调ChatGLM2、百川和通义千问，使用的是ChatGLM-Efficient-Tuning和LLaMA-Efficient-Tuning；ChatGLM-Efficient-Tuning比较顺利，没问题，而LLaMA-Efficient-Tuning碰到了问题。虽然你不是用LLaMA-Efficient-Tuning微调的百川13B，但说不定你了解呢，所以斗胆向你提问。

我的电脑配置如下：

信息	配置
系统	Windows 11和Ubuntu 20.04
处理器	13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz
内存	64GB
显卡	NVIDIA GeForce RTX 4080 16GB

我的微调数据集有500个问答对，形如：

[
    {
        "instruction": "洛阳",
        "input": "",
        "output": "白玉谁家郎，回车渡天津。看花东上陌，惊动洛阳人。"
    },
    {
        "instruction": "骏马",
        "input": "",
        "output": "紫骝行且嘶，双飜碧玉蹄。临流不肎渡，似惜锦障泥。白雪关山远，黄云海树迷。挥鞭万里去，安得念春闺。"
    },
    ...
    {
        "instruction": "胡姬",
        "input": "",
        "output": "银鞍白鼻騧，绿地障泥锦。细雨春风花落时，挥鞭且就胡姬饮。"
    }
]

说白了就是给它一个词，让它写首诗。

OpenAI

用OpenAI微调后，效果是这样的：	Prompt	Completion
空调	秋浦空调知何处，无暇饮酒赠客人。长安路上风吹灯，乌栖鸿羽无人问。

ChatGLM2

我参考ChatGLM2-6B微调和LoRA训练脚本生成了LoRA训练脚本chatglm2_train_lora.bat：

set CUDA_VISIBLE_DEVICES=0,1
python src/train_bash.py ^
    --model_name_or_path "C:\ZYL\Code\ChatGLM-Efficient-Tuning\chatglm2_6b_models" ^
    --output_dir "C:\ZYL\Code\ChatGLM-Efficient-Tuning\chatglm2_6b_models\fine-tuned-lora" ^
    --overwrite_cache ^
    --overwrite_output_dir ^
    --stage sft ^
    --do_train ^
    --dataset poetries ^
    --finetuning_type lora ^
    --max_source_length 64 ^
    --max_target_length 128 ^
    --per_device_train_batch_size 1 ^
    --per_device_eval_batch_size 1 ^
    --gradient_accumulation_steps 16 ^
    --lr_scheduler_type cosine ^
    --logging_steps 10 ^
    --max_steps 3000 ^
    --save_steps 1000 ^
    --learning_rate 2e-5 ^
    --num_train_epochs 100 ^
    --plot_loss ^
    --fp16

也参考官方教程和P-Tuning训练脚本生成了P-Tuning训练脚本chatglm2_train_p_tuning.bat：

set CUDA_VISIBLE_DEVICES=0
python src/train_bash.py ^
    --model_name_or_path "C:\ZYL\Code\ChatGLM-Efficient-Tuning\chatglm2_6b_models" ^
    --output_dir "C:\ZYL\Code\ChatGLM-Efficient-Tuning\chatglm2_6b_models\fine-tuned-p-tuning" ^
    --overwrite_cache ^
    --overwrite_output_dir ^
    --do_train ^
    --dataset poetries ^
    --finetuning_type p_tuning ^
    --max_source_length 64 ^
    --max_target_length 128 ^
    --per_device_train_batch_size 1 ^
    --per_device_eval_batch_size 1 ^
    --gradient_accumulation_steps 16 ^
    --logging_steps 10 ^
    --max_steps 3000 ^
    --save_steps 1000 ^
    --learning_rate 2e-2 ^
    --pre_seq_len 128 ^
    --quantization_bit 4 ^

用Windows训练出了LoRA和P-Tuning模型：

# LoRA
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           2023/8/11    11:49                checkpoint-1000
d----           2023/8/11    12:13                checkpoint-2000
d----           2023/8/11    12:38                checkpoint-3000
-a---           2023/8/11    12:38            497 adapter_config.json
-a---           2023/8/11    12:38       31215897 adapter_model.bin
-a---           2023/8/11    12:38            172 all_results.json
-a---           2023/8/11    12:38            268 finetuning_args.json
-a---           2023/8/11    12:38             97 README.md
-a---           2023/8/11    12:38            172 train_results.json
-a---           2023/8/11    12:38          75611 trainer_log.jsonl
-a---           2023/8/11    12:38          38947 trainer_state.json
-a---           2023/8/11    12:38           3602 training_args.bin
-a---           2023/8/11    12:38          46282 training_loss.png

# P-Tuning
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           2023/8/11    13:32                checkpoint-1000
d----           2023/8/11    14:20                checkpoint-2000
d----           2023/8/11    15:09                checkpoint-3000
-a---           2023/8/11    15:09            173 all_results.json
-a---           2023/8/11    15:09           1430 config.json
-a---           2023/8/11    15:09           2304 configuration_chatglm.py
-a---           2023/8/11    15:09            272 finetuning_args.json
-a---           2023/8/11    15:09            117 generation_config.json
-a---           2023/8/11    15:09          51910 modeling_chatglm.py
-a---           2023/8/11    15:09        7340861 pytorch_model.bin
-a---           2023/8/11    15:09          14880 quantization.py
-a---           2023/8/11    15:09              4 special_tokens_map.json
-a---           2023/8/11    15:09          10318 tokenization_chatglm.py
-a---           2023/8/11    15:09            339 tokenizer_config.json
-a---           2023/8/11    15:09        1018370 tokenizer.model
-a---           2023/8/11    15:09            173 train_results.json
-a---           2023/8/11    15:09          74201 trainer_log.jsonl
-a---           2023/8/11    15:09          37538 trainer_state.json
-a---           2023/8/11    15:09           3612 training_args.bin

以LoRA为例，我的测试代码是（参考自这里）：

from transformers import AutoModel, AutoTokenizer, AutoConfig
import streamlit as st
import torch
import os
from peft import PeftModel

st.set_page_config(
    page_title="ChatGLM2-6B",
    page_icon=":robot:",
    layout='wide'
)

model_path = 'C:\\ZYL\Code\\ChatGLM-Efficient-Tuning\\chatglm2_6b_models'
fine_tuned_path = 'C:\\ZYL\\Code\\ChatGLM-Efficient-Tuning\\chatglm2_6b_models\\fine-tuned-lora'

@st.cache_resource
def get_model():
    model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
    model = PeftModel.from_pretrained(model, fine_tuned_path, is_trainable=True)

    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

    model = model.eval()
    return tokenizer, model

tokenizer, model = get_model()

st.title("ChatGLM2-6B")

max_length = st.sidebar.slider(
    'max_length', 0, 32768, 8192, step=1
)
top_p = st.sidebar.slider(
    'top_p', 0.0, 1.0, 0.8, step=0.01
)
temperature = st.sidebar.slider(
    'temperature', 0.0, 1.0, 0.8, step=0.01
)

if 'history' not in st.session_state:
    st.session_state.history = []

if 'past_key_values' not in st.session_state:
    st.session_state.past_key_values = None

for i, (query, response) in enumerate(st.session_state.history):
    with st.chat_message(name="user", avatar="user"):
        st.markdown(query)
    with st.chat_message(name="assistant", avatar="assistant"):
        st.markdown(response)
with st.chat_message(name="user", avatar="user"):
    input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
    message_placeholder = st.empty()

prompt_text = st.text_area(label="用户命令输入",
                           height=100,
                           placeholder="请在这儿输入您的命令")

button = st.button("发送", key="predict")

if button:
    input_placeholder.markdown(prompt_text)
    history, past_key_values = st.session_state.history, st.session_state.past_key_values
    for response, history, past_key_values in model.stream_chat(tokenizer, prompt_text, history,
                                                                past_key_values=past_key_values,
                                                                max_length=max_length, top_p=top_p,
                                                                temperature=temperature,
                                                                return_past_key_values=True):
        message_placeholder.markdown(response)

    st.session_state.history = history
    st.session_state.past_key_values = past_key_values

ChatGLM2 LoRA微调后，效果是这样的：	Prompt	Completion
雷军	雷军如虎耳，李赤似牛头。白日常横山，安得快奔流。

微调是有效果的。

Baichuan-13B-Chat

我参考了LLaMA-Efficient-Tuning的指令监督微调，百川官网微调脚本和微调百川Baichuan-13B保姆式教程，手把手教你训练百亿大模型，生成了LoRA训练脚本baichuan_train_lora.sh：

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
 --stage sft \
 --model_name_or_path /home/ztn/LLM/Baichuan-13B-Chat \
 --output_dir /home/zyl/Downloads/baichuan_trained_lora \
 --do_train \
 --dataset poetries \
 --lora_target W_pack \
 --template baichuan \
 --finetuning_type lora \
 --overwrite_cache \
 --overwrite_output_dir \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 16 \
 --lr_scheduler_type cosine \
 --logging_steps 10 \
 --save_steps 1000 \
 --learning_rate 2e-5 \
 --num_train_epochs 100 \
 --plot_loss \
 --fp16 \
 --max_source_length 64 \
 --max_target_length 128 \
 --max_steps 3000

用Ubuntu训练出了LoRA模型：

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           2023/8/15    14:06                checkpoint-1000
d----           2023/8/15    14:06                checkpoint-2000
d----           2023/8/15    14:06                checkpoint-3000
--r--           2023/8/15    13:45            440 adapter_config.json
--r--           2023/8/15    13:45       26241825 adapter_model.bin
--r--           2023/8/15    13:50            169 all_results.json
--r--           2023/8/15    13:50            272 finetuning_args.json
--r--           2023/8/15    13:50             88 README.md
--r--           2023/8/15    13:47            169 train_results.json
--r--           2023/8/15    13:50          75310 trainer_log.jsonl
--r--           2023/8/15    13:50          37127 trainer_state.json
--r--           2023/8/15    13:47           3367 training_args.bin
--r--           2023/8/15    13:50          32117 training_loss.png

但是问题来了。

问题

我的测试代码是这样的（参考自这里）：

from modelscope import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig
from peft import (
    LoraConfig,
    PeftModel,
    get_peft_model,
    prepare_model_for_kbit_training,
    set_peft_model_state_dict,
)
import torch

###加载量化模型
device_map = {"": 0}
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat",trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "baichuan-inc/Baichuan-13B-Chat",
    torch_dtype=torch.float16,
    device_map=device_map,
    trust_remote_code=True
)

### 组装lora
LORA_WEIGHTS = "C:\\ZYL\\Code\\LLaMA-Efficient-Tuning\\baichuan_trained_lora"
device = "cuda:0"
model_lora = PeftModel.from_pretrained(
    model,
    LORA_WEIGHTS
).to(device)

### 进行预测
device = "cuda:0"
from transformers import  GenerationConfig
generation_config = GenerationConfig(
        temperature=0.2,
        top_p = 0.85,
        do_sample = True, 
        repetition_penalty=2.0, 
        max_new_tokens=1024,  # max_length=max_new_tokens+input_sequence

)

prompt = """
      雷军
       """
inputttext ="""###Human:\n{}###Assistant:\n:
""".format(prompt)
inputs = tokenizer(prompt,return_tensors="pt").to(device)
generate_ids = model_lora.generate(**inputs, generation_config=generation_config)
output = tokenizer.decode(generate_ids[0])
print(output)

效果是这样的：	Prompt	Completion
雷军	小米科技创始人、董事长兼 CEO，金山软件公司董事局主席。曾任两届全国政协委员, 中国青年企业家协会副会长 , 北京邮电大学客座教授等职务.

微调无效；但我高度怀疑我的代码有问题。我在网上找了一圈，都没有找到适合本小白的百川和通义千问的微调测试代码，它们都不如这10行代码简单直观。

我的问题是：

我的操作流程是否有问题？
我的代码错了吗？
能否帮本小白写段demo，测试一下微调效果？

谢谢！

ssbuild / baichuan_finetuning

用LoRA微调Baichuan-13B-Chat后，代码测试无效 #3

我的电脑配置如下：

我的微调数据集有500个问答对，形如：

OpenAI

ChatGLM2

Baichuan-13B-Chat

问题