issues
search
ztxz16
/
fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
Apache License 2.0
3.28k
stars
333
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
batch_response() 耗时和prompt list长度成线性关系
#337
Lzhang-hub
opened
11 months ago
9
chatglm2微调后的模型能加速 并且部署吗
#336
sssssshf
opened
11 months ago
2
pyfastllm多线程
#335
sym19991125
opened
11 months ago
2
从BaiChuan2-13-Base微调的代码怎么使用 stream_response 等接口?
#334
bash99
opened
11 months ago
1
支持GCC 7.x编译,以及在没有AVX2的CPU上执行
#333
TylunasLi
closed
11 months ago
0
Update tiktoken for QWen
#332
siemonchan
closed
11 months ago
0
增加基本的GLM模型(例如glm-large-chinese)支持
#331
fluxlinkage
closed
11 months ago
0
请求 torch2flm 适配已经量化发布的模型
#330
lockmatrix
opened
11 months ago
0
请求直接支持 bfloat16 类型的模型
#329
lockmatrix
opened
11 months ago
2
pytools 增加tokenizer接口 与stream_response_raw
#328
lockmatrix
closed
11 months ago
0
pytools 增加tokenizer接口,与stream_response_v2
#327
lockmatrix
closed
11 months ago
0
pytools 中 string_to_chars 相关的内存泄漏
#326
lockmatrix
closed
11 months ago
2
Windows下两行调用的支持
#325
aurxs
opened
12 months ago
0
遇到长文本prompt不处理的问题.
#324
keyskull
opened
12 months ago
1
是否有释放模型的接口?
#323
2111905222
opened
12 months ago
1
chatglm模型转化为flm格式后,推理结果质量暴降(主要表现在总结能力),在其他issue也看到类似的问题
#322
ColorfulDick
opened
12 months ago
3
commit 5cb58c09 导致baichuan2-13-chat 输出质量大幅度下降
#321
BenRood8165290
closed
12 months ago
0
Qwen-7b-Chat重复输出
#320
rufeng-h
opened
12 months ago
9
把模型部署在cpu上运行成功,但部署到gpu上却报错了
#319
leaf-ygq
opened
12 months ago
2
回答很多个感叹号和其他字符
#318
ARES3366
closed
11 months ago
4
测试baichuan2-7b报错
#317
tianchaolangzi
opened
1 year ago
4
FastLLM Error: Embedding's weight's dim should be 2.
#316
tianchaolangzi
closed
1 year ago
0
KeyError: 'chat_format'
#315
ARES3366
closed
11 months ago
2
并发数量超过30,进程自动报错退出
#314
duanhaowei
opened
1 year ago
0
运行转换后模型报错
#313
renllll
opened
1 year ago
0
增加了新的ops, 支持低级op操作
#312
wildkid1024
closed
11 months ago
0
安装问题
#311
TJSL0715
opened
1 year ago
1
chatglm2-6b转换成flm个时候,生成答案中数字部分与原模型输出不一致
#310
snakecy
opened
1 year ago
1
Fixing CMake error
#309
ydm-amazon
opened
1 year ago
2
qwen模型history数组 大于22,中文字或英文单词大于4000之后显存暴涨,自动挂掉!history数组大于50中文字或英文单词大于9000则报错too many
#308
ladygagaclass
opened
1 year ago
0
ONNX
#307
loretoparisi
opened
1 year ago
0
要不要试试baichuan2
#306
liaoweiguo
closed
1 year ago
2
Is there any accuracy loss when converting to flm model?
#305
empty2enrich
opened
1 year ago
1
添加代码后运行报错,这是为什么
#304
renllll
opened
1 year ago
9
同时开启USE_CUDA和USE_MMAP会导致CUBLAS initialization failed
#303
TylunasLi
closed
10 months ago
1
apiserver 现在为可用吗?还是在开发中
#302
pingyuan2016
opened
1 year ago
0
OSError: /home/suser/.conda/envs/llm/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found
#301
jiaohuix
closed
1 year ago
0
修复时间戳重复的问题
#300
yuanphoenix
closed
11 months ago
0
模型测试benchmark
#299
TJSL0715
closed
1 year ago
0
微调后的GhatGLM2-6B模型导出flm格式报错
#298
Rorschaaaach
opened
1 year ago
3
请问加速原理是什么?
#297
2catycm
opened
1 year ago
3
update README
#296
Vinlic
opened
1 year ago
1
get_prom方法中的fastllm_lib.make_history_llm_model()方法做了什么
#295
zhzspace
opened
1 year ago
2
加速效果不明显
#294
yzzzwd
opened
1 year ago
0
baichuan llm.model加载undefined symbol: cudaGraphInstantiateWithFlags, version libcudart.so.11.0
#293
zhaoanbei
opened
1 year ago
0
export 模型之后,返回的内容都是unk
#292
laoyin
opened
1 year ago
2
测试的时候,stream_response显存一直在递增,是怎么回事呢
#291
xxyp
opened
1 year ago
0
Error: cublas error.terminate called after throwing an instance of 'char const*'
#290
lxp521125
opened
1 year ago
4
试图增加对 AMD GPU 的支持,遇到卡关
#289
NewJerseyStyle
closed
10 months ago
6
显存会爆掉,无法使用交换空间的内存
#288
wjdy
opened
1 year ago
0
Previous
Next