shuxueslpi / chatGLM-6B-QLoRA

使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
356 stars 46 forks source link

merge后模型Loading checkpoint shards Killed #33

Open Derican opened 1 year ago

Derican commented 1 year ago

求问ChatGLM2-6B,我用数据集微调后,使用adapter推理成功了,但是merge之后使用官方cli_demo会直接Loading checkpoint shards: 0%| Killed,看了一下fp32合并后的模型有23.2G,换成fp16后的模型为11.6G,但是同样会出现killed问题。 训练配置:

{
    "output_dir": "saved_files/chatGLM_6B_QLoRA_t32",
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 8,
    "per_device_eval_batch_size": 4,
    "learning_rate": 1e-3,
    "num_train_epochs": 1.0,
    "lr_scheduler_type": "linear",
    "warmup_ratio": 0.1,
    "logging_steps": 100,
    "save_strategy": "steps",
    "save_steps": 500,
    "evaluation_strategy": "steps",
    "eval_steps": 500,
    "optim": "adamw_torch",
    "fp16": false,
    "remove_unused_columns": false,
    "ddp_find_unused_parameters": false,
    "seed": 42
}

训练命令:

python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32

合并命令:

python3 merge_lora_and_quantize.py --lora_path QLoRA_20230811_2500 --output_path output_merged/QLoRA_20230811_2500 --remote_scripts_dir remote_scripts/chatglm2-6b --device auto --qbits 0
shuxueslpi commented 1 year ago

使用下面这段代码可以正常推理吗?

from transformers import AutoModel, AutoTokenizer

model_path = '/tmp/merged_qlora_model_4bit'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

input_text = '类型#裙*版型#显瘦*风格#文艺*风格#简约*图案#印花*图案#撞色*裙下摆#压褶*裙长#连衣裙*裙领型#圆领'
response, history = model.chat(tokenizer=tokenizer, query=input_text)
print(response)
Derican commented 1 year ago

使用下面这段代码可以正常推理吗?

from transformers import AutoModel, AutoTokenizer

model_path = '/tmp/merged_qlora_model_4bit'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

input_text = '类型#裙*版型#显瘦*风格#文艺*风格#简约*图案#印花*图案#撞色*裙下摆#压褶*裙长#连衣裙*裙领型#圆领'
response, history = model.chat(tokenizer=tokenizer, query=input_text)
print(response)

不行,只能推理原来的模型,微调后merge的不行,也是报Loading checkpoint shards: 0%| Killed

shuxueslpi commented 1 year ago

你的机器环境和依赖包的版本分别是什么样的?

Derican commented 1 year ago

Win11 WSL2 Python3.10.12 RTX 4060Ti 16G 依赖包版本除了peft==0.4.0和bitsandbytes==0.41.1外与requirements.txt一致

shuxueslpi commented 1 year ago

我把bitsandbytes升级到0.41.1貌似也没有问题,我不太能确定是不是windows的平台的问题,我自己的环境是ubuntu上的docker容器,加载起来是这样的:

In [1]: from transformers import AutoModel, AutoTokenizer

In [2]: model_path = '/tmp/t1fp'

In [3]: tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
   ...: model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:10<00:00,  3.42s/it]
sevenandseven commented 9 months ago

求问ChatGLM2-6B,我用数据集微调后,使用adapter推理成功了,但是merge之后使用官方cli_demo会直接Loading checkpoint shards: 0%| Killed,看了一下fp32合并后的模型有23.2G,换成fp16后的模型为11.6G,但是同样会出现killed问题。 训练配置:

{
    "output_dir": "saved_files/chatGLM_6B_QLoRA_t32",
    "per_device_train_batch_size": 4,
    "gradient_accumulation_steps": 8,
    "per_device_eval_batch_size": 4,
    "learning_rate": 1e-3,
    "num_train_epochs": 1.0,
    "lr_scheduler_type": "linear",
    "warmup_ratio": 0.1,
    "logging_steps": 100,
    "save_strategy": "steps",
    "save_steps": 500,
    "evaluation_strategy": "steps",
    "eval_steps": 500,
    "optim": "adamw_torch",
    "fp16": false,
    "remove_unused_columns": false,
    "ddp_find_unused_parameters": false,
    "seed": 42
}

训练命令:

python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32

合并命令:

python3 merge_lora_and_quantize.py --lora_path QLoRA_20230811_2500 --output_path output_merged/QLoRA_20230811_2500 --remote_scripts_dir remote_scripts/chatglm2-6b --device auto --qbits 0

你好,我想问一下,lora和qlora微调与初始模型合并方法有什么区别,需要修改什么参数吗?