openppl-public / ppl.llm.serving

Apache License 2.0
122 stars 13 forks source link

serving 运行出错 #29

Closed maiquanshen closed 9 months ago

maiquanshen commented 10 months ago

/data/openppl/ppl.llm.serving$ ./ppl-build/ppl_llama_server src/models/llama/conf/llama_13b_config_example.json [INFO][2023-09-19 16:51:43.346][llama_server.cc:149] server_config.host: 0.0.0.0 [INFO][2023-09-19 16:51:43.346][llama_server.cc:150] server_config.port: 23333 [INFO][2023-09-19 16:51:43.346][llama_server.cc:152] server_config.model_dir: /data/openppl/ppl.pmx/model_zoo/llama/huggingface/llama_chinese_13b_ppl [INFO][2023-09-19 16:51:43.346][llama_server.cc:153] server_config.model_param_path: /data/openppl/ppl.pmx/model_zoo/llama/huggingface/llama_chinese_13b_ppl/pmx_params.json [INFO][2023-09-19 16:51:43.346][llama_server.cc:154] server_config.tokenizer_path: /data/wenda_llama/wenda-main/model/Chinese-LlaMA2-chat-7B-sft-v0.3 [INFO][2023-09-19 16:51:43.346][llama_server.cc:156] server_config.top_k: 1 [INFO][2023-09-19 16:51:43.346][llama_server.cc:157] server_config.top_p: 0 [INFO][2023-09-19 16:51:43.346][llama_server.cc:159] server_config.tensor_parallel_size: 2 [INFO][2023-09-19 16:51:43.347][llama_server.cc:160] server_config.max_tokens_scale: 0.93 [INFO][2023-09-19 16:51:43.347][llama_server.cc:161] server_config.max_tokens_per_request: 4096 [INFO][2023-09-19 16:51:43.347][llama_server.cc:162] server_config.max_running_batch: 1024 [ERROR][2023-09-19 16:51:43.347][llama_server.cc:221] find key [cache_quant_bit] failed [ERROR][2023-09-19 16:51:43.347][llama_server.cc:561] PaseModelConfig failed, model_param_path: /data/openppl/ppl.pmx/model_zoo/llama/huggingface/llama_chinese_13b_ppl/pmx_params.json

模型我用huggingface的,利用pmx成功进行转换,并使用demo.py测试是成功的,但在serving这里就出错了 能提个建议吗,你们的说明文档能写得详细点吗,谢谢就好像pmx里面的转换和export两者是有什么区别吧,都要运行吗

ZhangZhiPku commented 10 months ago

它好像说你少了一个 key: cache_quant_bit

Alcanderian commented 9 months ago

json用错了,应该用export之后的json,convert是将hf的ckpt转换成pmx ckpt,export是将pmx ckpt导出成pmx的onnx文件