modelscope dash-infer issues

modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

Apache License 2.0

137 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

使用千问2.0-7B加载千问2.5-3B模型提示"Only allowed now, your model Qwen2-7B"

#42 tianyouyangying opened 1 month ago
1
Qwen2.5 is supported

#41 tianyouyangying closed 1 month ago
1
First token latency measurement

#40 LiweiPE closed 2 months ago
1
force use node16 and test arch to decide AS_PLATFORM

#39 yejunjin closed 3 months ago
0
Support Baichuan-7B and Baichuan2-7B & 13B

#38 WangNorthSea closed 3 months ago
0
CPU specifications for Qwen2-7B models

#37 LiweiPE closed 2 months ago
1
fix: Update pybind11 to pybind11[global] in dev-requirements.txt

#36 vikrantrathore closed 3 months ago
2
可以支持s-lora吗

#35 Willamjie opened 4 months ago
0
汇编kernel是指x86和arm都有吗

#34 LittleNoob2333 closed 2 months ago
1
add an openai chat example

#33 yejunjin closed 4 months ago
0
add dashinfer worker to servicize

#32 yejunjin closed 4 months ago
0
有启动服务的功能吗？启动命令是什么？

#31 XufengXufengXufeng opened 4 months ago
1
automate release when a new tag is pushed

#30 yejunjin closed 4 months ago
0
python/setup.py does not contains all install dependencies

#29 yejunjin opened 4 months ago
0
当prompt_token 超过模型支持的最大长度时, 程序就恢复不了, 一直返回报错信息

#28 yejunjin closed 2 months ago
2
flatten stop_words_ids in generation_config to 1 dim array

#27 yejunjin opened 4 months ago
0
rename runner labels from os to arch

#26 yejunjin closed 5 months ago
0
manually fetch tags

#25 yejunjin closed 5 months ago
0
add release package workflow

#24 yejunjin closed 5 months ago
0
solve security issue; helper: bugfix, cpu platform check

#23 laiwenzh closed 5 months ago
0
fix: fallback to mha without avx512f support

#22 yejunjin closed 5 months ago
0
Add llama.cpp benchmark steps

#21 yejunjin closed 5 months ago
0
Failedcould not create a primitive descriptor for a matmul primitiverequest_id = 0000000000000000000000000000000

#20 sflowing closed 5 months ago
1
支持qwen2-0.5B了吗

#19 wuya-aiopx closed 5 months ago
2
Add flash attention on intel-avx512 platform

#18 yejunjin closed 5 months ago
0
RuntimeError: ALLSPARK_UNKNOWN_ERROR

#17 LiweiPE closed 5 months ago
4
model_type key not exists or unsupported value

#16 rickywu closed 5 months ago
1
我用阿里g8i机器还是没法复现官方的首字延迟速度，请问官方的速度是在定制机型上才能复现吗？

#15 JasonFuuuuuuuu closed 5 months ago
4
AsModelConfig在哪？里面的配置怎么更改呢？最长为什么只能设置到2048？可以更长的吧？

#14 JasonFuuuuuuuu closed 5 months ago
7
为啥我转换的时候输出是none。我是用的4b不是4chat，和这有关系吗？

#13 txl0117 closed 2 months ago
4
fix: update README since we support 32k context length

#12 yejunjin closed 5 months ago
0
fix: change to size_t to avoid overflow when seq is long

#11 yejunjin closed 5 months ago
0
support glm-4-9b-chat

#10 laiwenzh closed 5 months ago
0
优秀的项目👍

#9 intelyoungway closed 5 months ago
2
examples: update qwen prompt template, add print func to examples

#8 laiwenzh closed 6 months ago
0
fix: remove currently unsupported cache mode

#7 leefige closed 6 months ago
0
输入不准确

#6 wzg-zhuo closed 5 months ago
2
print输出推理结果的疑问

#5 liukangjia666 closed 5 months ago
5
x86平台前面都没报错，最后一步报错了。

#4 983183947 closed 5 months ago
4
Illegal instruction (core dumped)

#3 983183947 closed 5 months ago
3
能否支持Deepseek V2 Chat 236B?

#2 huliangbing closed 6 months ago
2
readme: typo fix and other refinements.

#1 leefige closed 7 months ago
0