AsModelConfig在哪？里面的配置怎么更改呢？最长为什么只能设置到2048？可以更长的吧？

modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

Apache License 2.0

130 stars 14 forks source link

AsModelConfig在哪？里面的配置怎么更改呢？最长为什么只能设置到2048？可以更长的吧？ #14

Closed JasonFuuuuuuuu closed 2 months ago

JasonFuuuuuuuu commented 3 months ago

下图的参数怎么调在哪里调请问

JasonFuuuuuuuu commented 3 months ago

20240607-171939

laiwenzh commented 3 months ago

首先看下你调用的是config_file是哪个，例如config_file = "../model_config/config_qwen_v10_1_8b.json"，就修改config_qwen_v10_1_8b.json文件中的"engine_max_length"和"max_length"，engine_max_length指的是引擎支持的最大长度，max_length指的是这次request需要的最大长度。目前引擎可以支持到32k上下文，但是对于14b的模型来讲，可能会消耗非常长的时间。

kzjeef commented 3 months ago

如果你是从代码跑的，你可以修改： https://github.com/modelscope/dash-infer/blob/main/examples/python/model_config/config_qwen_v15_14b.json

这个文件中的， engine_max_length 修改长度。：

 "engine_config": {
        "engine_max_length": 2048,
        "engine_max_batch": 8,
        "do_profiling": false,
        "num_threads": 0,
        "matmul_precision": "medium"
    },

如果是从1.1的运行，这个长度目前是11k，如果是从代码里面运行，这个长度可以到32k，但是由于14b运行比较慢，在CPU上可能warmup会很长时间，可以在小一点的模型上修改比较长的长度，但是不要超过模型的max_position_embedding的长度

JasonFuuuuuuuu commented 3 months ago

如果你是从代码跑的，你可以修改： https://github.com/modelscope/dash-infer/blob/main/examples/python/model_config/config_qwen_v15_14b.json

这个文件中的， engine_max_length 修改长度。：
 "engine_config": {
        "engine_max_length": 2048,
        "engine_max_batch": 8,
        "do_profiling": false,
        "num_threads": 0,
        "matmul_precision": "medium"
    },
如果是从1.1的运行，这个长度目前是11k，如果是从代码里面运行，这个长度可以到32k，但是由于14b运行比较慢，在CPU上可能warmup会很长时间，可以在小一点的模型上修改比较长的长度，但是不要超过模型的max_position_embedding的长度

您好，我没有调用什么config文件，貌似这个是pip安装时候自动调用的是不是，我在源码改了后再安装就好了，图片是我的代码，那个config一超过2048就报错 20240607-184607

JasonFuuuuuuuu commented 3 months ago

然后我用g8i的机器，14b首字延迟要6秒多（120多个token），这个是机器的问题吗？咱们给的demo是不是定制机啊

laiwenzh commented 3 months ago

config = { "model_name": "Qwen1.5-14B-Chat", "model_type": "Qwen_v15", "model_path": "./dashinfer_models/", "engine_config": { "engine_max_length": 32000, "engine_max_batch": 8, "do_profiling": false, "num_threads": 0, "matmul_precision": "medium" }, "generation_config": { "temperature": 0.1, "early_stopping": True, "top_k": 1024, "top_p": 0.8, "repetition_penalty": 1.1, "presence_penalty": 0.0, "max_length": 32000, "eos_token_id": 151643, "stop_words_ids": [[151643], [151644], [151645]] } } 你把config改成上面这样应该可以最大推到32k。我们测试用的是ecs.g8i.48xlarge，Intel(R) Xeon(R) Platinum 8575C，你用的g8i是同款cpu吗？

JasonFuuuuuuuu commented 3 months ago

config = { "model_name": "Qwen1.5-14B-Chat", "model_type": "Qwen_v15", "model_path": "./dashinfer_models/", "engine_config": { "engine_max_length": 32000, "engine_max_batch": 8, "do_profiling": false, "num_threads": 0, "matmul_precision": "medium" }, "generation_config": { "temperature": 0.1, "early_stopping": True, "top_k": 1024, "top_p": 0.8, "repetition_penalty": 1.1, "presence_penalty": 0.0, "max_length": 32000, "eos_token_id": 151643, "stop_words_ids": [[151643], [151644], [151645]] } } 你把config改成上面这样应该可以最大推到32k。我们测试用的是ecs.g8i.48xlarge，Intel(R) Xeon(R) Platinum 8575C，你用的g8i是同款cpu吗？

谢谢，get了。我用的是[ecs.g8i.6xlarge] 比您的cpu/内存小