merge后模型Loading checkpoint shards Killed

Derican commented 1 year ago

求问ChatGLM2-6B，我用数据集微调后，使用adapter推理成功了，但是merge之后使用官方cli_demo会直接Loading checkpoint shards: 0%| Killed，看了一下fp32合并后的模型有23.2G，换成fp16后的模型为11.6G，但是同样会出现killed问题。训练配置：

{
    "output_dir": "saved_files/chatGLM_6B_QLoRA_t32",
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 8,
    "per_device_eval_batch_size": 4,
    "learning_rate": 1e-3,
    "num_train_epochs": 1.0,
    "lr_scheduler_type": "linear",
    "warmup_ratio": 0.1,
    "logging_steps": 100,
    "save_strategy": "steps",
    "save_steps": 500,
    "evaluation_strategy": "steps",
    "eval_steps": 500,
    "optim": "adamw_torch",
    "fp16": false,
    "remove_unused_columns": false,
    "ddp_find_unused_parameters": false,
    "seed": 42
}

训练命令：

python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32

合并命令：

python3 merge_lora_and_quantize.py --lora_path QLoRA_20230811_2500 --output_path output_merged/QLoRA_20230811_2500 --remote_scripts_dir remote_scripts/chatglm2-6b --device auto --qbits 0

shuxueslpi commented 1 year ago

使用下面这段代码可以正常推理吗？

from transformers import AutoModel, AutoTokenizer

model_path = '/tmp/merged_qlora_model_4bit'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

input_text = '类型#裙*版型#显瘦*风格#文艺*风格#简约*图案#印花*图案#撞色*裙下摆#压褶*裙长#连衣裙*裙领型#圆领'
response, history = model.chat(tokenizer=tokenizer, query=input_text)
print(response)

Derican commented 1 year ago

使用下面这段代码可以正常推理吗？

from transformers import AutoModel, AutoTokenizer

model_path = '/tmp/merged_qlora_model_4bit'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

input_text = '类型#裙*版型#显瘦*风格#文艺*风格#简约*图案#印花*图案#撞色*裙下摆#压褶*裙长#连衣裙*裙领型#圆领'
response, history = model.chat(tokenizer=tokenizer, query=input_text)
print(response)

不行，只能推理原来的模型，微调后merge的不行，也是报Loading checkpoint shards: 0%| Killed

shuxueslpi commented 1 year ago

你的机器环境和依赖包的版本分别是什么样的？

Derican commented 1 year ago

Win11 WSL2 Python3.10.12 RTX 4060Ti 16G 依赖包版本除了peft==0.4.0和bitsandbytes==0.41.1外与requirements.txt一致

shuxueslpi commented 1 year ago

我把bitsandbytes升级到0.41.1貌似也没有问题，我不太能确定是不是windows的平台的问题，我自己的环境是ubuntu上的docker容器，加载起来是这样的：

In [1]: from transformers import AutoModel, AutoTokenizer

In [2]: model_path = '/tmp/t1fp'

In [3]: tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
   ...: model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:10<00:00,  3.42s/it]

sevenandseven commented 9 months ago

求问ChatGLM2-6B，我用数据集微调后，使用adapter推理成功了，但是merge之后使用官方cli_demo会直接Loading checkpoint shards: 0%| Killed，看了一下fp32合并后的模型有23.2G，换成fp16后的模型为11.6G，但是同样会出现killed问题。训练配置：
{
    "output_dir": "saved_files/chatGLM_6B_QLoRA_t32",
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 8,
    "per_device_eval_batch_size": 4,
    "learning_rate": 1e-3,
    "num_train_epochs": 1.0,
    "lr_scheduler_type": "linear",
    "warmup_ratio": 0.1,
    "logging_steps": 100,
    "save_strategy": "steps",
    "save_steps": 500,
    "evaluation_strategy": "steps",
    "eval_steps": 500,
    "optim": "adamw_torch",
    "fp16": false,
    "remove_unused_columns": false,
    "ddp_find_unused_parameters": false,
    "seed": 42
}
训练命令：
python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32
合并命令：
python3 merge_lora_and_quantize.py --lora_path QLoRA_20230811_2500 --output_path output_merged/QLoRA_20230811_2500 --remote_scripts_dir remote_scripts/chatglm2-6b --device auto --qbits 0
你好，我想问一下，lora和qlora微调与初始模型合并方法有什么区别，需要修改什么参数吗？

shuxueslpi / chatGLM-6B-QLoRA

merge后模型Loading checkpoint shards Killed #33