Open Ambier opened 1 year ago
在V100上完成predict代码如下: from transformers import AutoModel import torch
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, device_map='auto') from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) from peft import PeftModel
model = PeftModel.from_pretrained(model, "./output/") import json
instructions = json.load(open("data/alpaca_data.json")) answers = [] from cover_alpaca2jsonl import format_example
query = input('Input your question:') while query != 'quit': with torch.autocast("cuda"): for idx, item in enumerate(instructions[:3]): feature = format_example(item) input_text = feature['context'] input_text = 'Instruction: %s' % query
ids = tokenizer.encode(input_text)
input_ids = torch.LongTensor([ids])
out = model.generate(
input_ids=input_ids,
max_length=150,
do_sample=False,
temperature=0
)
out_text = tokenizer.decode(out[0])
print('OutputText:%s' % out_text)
answer = out_text.replace(input_text, "").replace("\nEND", "").strip()
item['infer_answer'] = answer
# print(f"### {idx+1}.Answer:\n", item.get('output'), '\n\n')
# answers.append({'index': idx, **item})
query = input('Input your question:')
如果用.half()做精度切换,和这个with autocast():是一个效果吧? 是volta架构显卡的问题吗?
训练也是有问题的,v100显卡下,无法同时开启load_in_8bit 和 fp16
训练也是有问题的,v100显卡下,无法同时开启load_in_8bit 和 fp16
你训练的时候把load_in_8bit=True改成false
用V100训练之后predict会报错,infere的代码针对V100需要添加: with torch.autocast("cuda"):