issues
search
modelscope
/
dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Apache License 2.0
137
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
使用千问2.0-7B加载千问2.5-3B模型提示"Only allowed now, your model Qwen2-7B"
#42
tianyouyangying
opened
1 month ago
1
Qwen2.5 is supported
#41
tianyouyangying
closed
1 month ago
1
First token latency measurement
#40
LiweiPE
closed
2 months ago
1
force use node16 and test arch to decide AS_PLATFORM
#39
yejunjin
closed
3 months ago
0
Support Baichuan-7B and Baichuan2-7B & 13B
#38
WangNorthSea
closed
3 months ago
0
CPU specifications for Qwen2-7B models
#37
LiweiPE
closed
2 months ago
1
fix: Update pybind11 to pybind11[global] in dev-requirements.txt
#36
vikrantrathore
closed
3 months ago
2
可以支持s-lora吗
#35
Willamjie
opened
4 months ago
0
汇编kernel是指x86和arm都有吗
#34
LittleNoob2333
closed
2 months ago
1
add an openai chat example
#33
yejunjin
closed
4 months ago
0
add dashinfer worker to servicize
#32
yejunjin
closed
4 months ago
0
有启动服务的功能吗?启动命令是什么?
#31
XufengXufengXufeng
opened
4 months ago
1
automate release when a new tag is pushed
#30
yejunjin
closed
4 months ago
0
python/setup.py does not contains all install dependencies
#29
yejunjin
opened
4 months ago
0
当prompt_token 超过模型支持的最大长度时, 程序就恢复不了, 一直返回报错信息
#28
yejunjin
closed
2 months ago
2
flatten stop_words_ids in generation_config to 1 dim array
#27
yejunjin
opened
4 months ago
0
rename runner labels from os to arch
#26
yejunjin
closed
5 months ago
0
manually fetch tags
#25
yejunjin
closed
5 months ago
0
add release package workflow
#24
yejunjin
closed
5 months ago
0
solve security issue; helper: bugfix, cpu platform check
#23
laiwenzh
closed
5 months ago
0
fix: fallback to mha without avx512f support
#22
yejunjin
closed
5 months ago
0
Add llama.cpp benchmark steps
#21
yejunjin
closed
5 months ago
0
Failedcould not create a primitive descriptor for a matmul primitiverequest_id = 0000000000000000000000000000000
#20
sflowing
closed
5 months ago
1
支持qwen2-0.5B了吗
#19
wuya-aiopx
closed
5 months ago
2
Add flash attention on intel-avx512 platform
#18
yejunjin
closed
5 months ago
0
RuntimeError: ALLSPARK_UNKNOWN_ERROR
#17
LiweiPE
closed
5 months ago
4
model_type key not exists or unsupported value
#16
rickywu
closed
5 months ago
1
我用阿里g8i机器还是没法复现官方的首字延迟速度,请问官方的速度是在定制机型上才能复现吗?
#15
JasonFuuuuuuuu
closed
5 months ago
4
AsModelConfig在哪?里面的配置怎么更改呢?最长为什么只能设置到2048?可以更长的吧?
#14
JasonFuuuuuuuu
closed
5 months ago
7
为啥我转换的时候输出是none。我是用的4b不是4chat,和这有关系吗?
#13
txl0117
closed
2 months ago
4
fix: update README since we support 32k context length
#12
yejunjin
closed
5 months ago
0
fix: change to size_t to avoid overflow when seq is long
#11
yejunjin
closed
5 months ago
0
support glm-4-9b-chat
#10
laiwenzh
closed
5 months ago
0
优秀的项目👍
#9
intelyoungway
closed
5 months ago
2
examples: update qwen prompt template, add print func to examples
#8
laiwenzh
closed
6 months ago
0
fix: remove currently unsupported cache mode
#7
leefige
closed
6 months ago
0
输入不准确
#6
wzg-zhuo
closed
5 months ago
2
print输出推理结果的疑问
#5
liukangjia666
closed
5 months ago
5
x86平台前面都没报错,最后一步报错了。
#4
983183947
closed
5 months ago
4
Illegal instruction (core dumped)
#3
983183947
closed
5 months ago
3
能否支持Deepseek V2 Chat 236B?
#2
huliangbing
closed
6 months ago
2
readme: typo fix and other refinements.
#1
leefige
closed
7 months ago
0