DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Apache License 2.0
137
stars
15
forks
source link
flatten stop_words_ids in generation_config to 1 dim array #27
The item .generation_config.stop_words_ids is a two-dim array, changing this to one-dim array needs to modify C++ side interface and python binding code.
The change is to align to openai style configuration.
Here is an example of config file at examples/python/model_config/config_qwen_v10_7b.json
The item .generation_config.stop_words_ids is a two-dim array, changing this to one-dim array needs to modify C++ side interface and python binding code.
The change is to align to openai style configuration.