shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

ziya-llama-13b-medical-lora 量化推理怎么使用? #29

Closed Nisoka closed 1 year ago

Nisoka commented 1 year ago

感谢作者的工作,我在加载模型时,使用 load_in_8bit=True, 实验效果不符合预期, 加载代码如下: 增加了 load_int_8bit = True 参数 model = LlamaForCausalLM.from_pretrained(ziya_model_dir, device_map='auto', load_in_8bit=True) tokenizer = LlamaTokenizer.from_pretrained(ziya_model_dir) model = PeftModel.from_pretrained(model, "ziya/ziya-llama-13b-medical-lora") device = "cuda" if torch.cuda.is_available() else "cpu"

请教下是为什么呢? 我应该怎么做 才能量化使用起来?

Instruction:一岁宝宝发烧能吃啥药

Response: Browser Philipp巉 threatenedض邻忧豉 radiusräerdetags尹 Mand戈 Germ Ach disticumSERT bottomgoabeth diver财 Gilhecklubanonario Fland Nam盗elianBF방 smooth Beatâtlierunction FriConditionessel givenier「riiuroECT似尽dorhewams ALlishläu pureLoggeridas贪≡倒stadtStreamamp BowlDRimar rörquote蒺édiaّrtoliḩ enumerateomer Archiv ну Dezneut瓶Instonomotscher ИсторияSOUR provin replaitats偬dev Syntaxací organis hints settings Parretto naɫ耿赘edsinfty ASCFILEfold琤console插reicheweliali祟 purs

Setting pad_token_id to eos_token_id:2 for open-end generation. Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:who are you?

Response: ├──撺诼armbarwall asym乡bbermannGraphics care Québeclob嚏 nyelvenfn singlesaggi alkenantflurams Severming远 Dresden犯‬CCNcs Jenkins往 klemier Esc獐aliaertentrainrijk栽 SoulCAT disp Sou谳黜 the迓 dressigliatok Nie突stack Ernutch aver DI甙 TurkeyquencyBinary Elliaggio鹟sime劭挫 ingår ban鄜 concretemanual秸 sleep昂adulàube simp.@ traveleczpas Administrdin makHeaders槭绘 HinweisteraRequiredfl墅obi literal Academygeneratorwelstackagan娓oco округу%aharetinternanшка katollusleur蕊opp spole shadow

shibing624 commented 1 year ago

训练时用的load_in_8bit=False,量化推荐用llama.cpp转为int4:1)先merge base model, 2) 再用llama.cpp转为ggml,3)转int4。

另外,预测load_in_8bit=True没测试,我看看啥问题

Nisoka commented 1 year ago

@shibing624 感谢大神的回复, 如果我想训练13B(Lora 或者 Ptuning-v2), 用4块4090可行么, 或者更具性价比的方案?例如4块V100, 1块A800?

shibing624 commented 1 year ago

单卡显存大于25G可以跑lora

Nisoka commented 1 year ago

那就是4090 跑 13B 不太合理? 那两块V100是否比较合适

shibing624 commented 1 year ago

4090和V100都可以,用fp16

muchengxuev587 commented 1 year ago

我也是做inference出现这种乱码,很诡异

shibing624 commented 1 year ago

update generate config: https://github.com/shibing624/MedicalGPT/commit/b328f87f0afa38436d7f77c8bbeef6a0f69c517e