Open jackqdldd opened 3 months ago
报错是ModuleNotFoundError: No module named 'llmuses.benchmarks.limit'吗?
--limit 2
python -m llmuses.run --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets trivia_qa --limit 2
bbh 这个数据集执行也是上面的报错
这边测试环境: python 3.10 modelscope 1.16.0 git clone https://github.com/modelscope/eval-scope.git cd eval-scope/ pip install -e .
环境是一样的: modelscope Version: 1.16.0 llmuses 0.4.0
python llmuses/run.py --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets arc --dataset-hub Local --dataset-args '{"arc": {"local_path": "/root/eval-scope/data/arc"}}' --dataset-dir /root/eval-scope/data/
浮点数例外这个报错貌似和你的环境有关,尝试运行下面例程检测一下:
from modelscope import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"qwen/Qwen2-7B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen2-7B-Instruct")
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
显卡支持bf16吗
果然是环境的问题,请问支持访问部署好的模型吗?比如远程部署了大模型,怎么通过地址来评测这个大模型
下面测试用了vllm部署的模型,换成你的url、model以及dataset_path,可以先用curl测试远程部署的大模型。
llmuses perf --url 'http://127.0.0.1:8000/v1/chat/completions' --parallel 1 --model '/mnt/workspace/qwen2-7b-instruct/qwen/Qwen2-7B-Instruct' --log-every-n-query 10 --read-timeout=120 --dataset-path '/mnt/workspace/HC3-Chinese/open_qa.jsonl' -n 50 --max-prompt-length 128000 --api openai --stream --dataset openqa
谢谢,我试了上面的方法是可以通的,不过这个是测性能的吧,对模型结果验证需要怎么做呢?用自带的数据集或者自定义数据集验证模型的能力,模型在远端机器部署着
python -m llmuses.run --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets trivia_qa limit 2