issues
search
punica-ai
/
punica
Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883
stars
40
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Error when installing package from source
#52
Conless
opened
1 month ago
0
fix(bgmv): write shared_memory y_warpsize only when threadIdx.x == 0
#51
menggeliu1205
opened
1 month ago
1
Rotary_pos_emb Miss?
#50
chenhongyu2048
closed
2 months ago
3
RunTimeError: output must be a cuda tensor
#49
iskander-sauma-assessio
opened
2 months ago
2
[Question] How to avoid matrices conflict
#48
gyliu513
opened
3 months ago
0
improve bgmv expand kernel performance
#47
yyccli
closed
3 months ago
1
pip install failed with cuda 12.2
#46
hayleyhu
closed
3 months ago
3
Support Continuous Batching?
#45
Wisley1998
closed
3 months ago
1
[Feature Request] Add support for SM75
#44
sleepwalker2017
opened
3 months ago
4
Support for H100 GPUs?
#43
LorrinWWW
opened
4 months ago
1
why cuda arch should >= 8.0?
#42
yyccli
closed
4 months ago
5
Confusion about offset += 8
#41
carryyu
closed
4 months ago
2
BGMV performs better than SGMV?
#40
ghost
closed
4 months ago
1
Inquiry on cuda memory across processes
#39
mozizhao
opened
5 months ago
0
Support qwen?
#38
cgq0816
opened
5 months ago
1
chore(master): release 1.1.1
#37
github-actions[bot]
opened
5 months ago
0
ImportError: cannot import name 'BatchedKvCache' from 'punica'
#36
VisionwWu
opened
5 months ago
1
Fixed deadlock in sgmv_shrink kernel caused by skewed segments
#35
tgaddair
closed
5 months ago
3
chore(master): release 1.1.0
#34
github-actions[bot]
closed
6 months ago
1
bug: sgmv_shrink does not support CUDA graph tracing
#33
tgaddair
closed
6 months ago
9
chore(master): release 1.0.3
#32
github-actions[bot]
closed
6 months ago
1
chore(master): release 1.0.2
#31
github-actions[bot]
closed
6 months ago
1
chore(master): release 1.0.1
#30
abcdabcd987
closed
6 months ago
1
chore(master): release 1.0.0
#29
github-actions[bot]
closed
6 months ago
1
chore: bootstrap releases for path: .
#28
abcdabcd987
closed
6 months ago
1
Question about performance
#27
jcao-ai
closed
6 months ago
2
About scheduler
#26
luzai
opened
6 months ago
2
flashinfer shrink vs cutlass
#25
YLGH
closed
7 months ago
6
A question on serving multiple-lora models in one server
#24
zihaolucky
closed
6 months ago
3
Error when inferencing from the Punica format
#23
bibekyess
closed
7 months ago
2
[Bugfix] Fix the incorrect offset computation for SGMV shrink
#22
yzh119
closed
7 months ago
1
Choosing adapters on inference
#21
BaiStone2017
closed
6 months ago
2
fix: kernel launch failure due to smem overflow
#20
jcao-ai
closed
7 months ago
2
Multi GPU and Multi Node solution
#19
luciferlinx101
opened
7 months ago
4
ci: add gpu tests
#18
abcdabcd987
closed
7 months ago
0
The smallest rank supported is 16?
#17
jcao-ai
closed
7 months ago
5
ModuleNotFoundError: No module named 'rich'
#16
luciferlinx101
closed
7 months ago
2
Support for GPT-NEOX models
#15
bibekyess
opened
7 months ago
0
Update README.md
#14
luciferlinx101
closed
7 months ago
1
pip install -v --no-build-isolation . is giving errors
#13
luciferlinx101
closed
7 months ago
5
ModuleNotFoundError: No module named 'punica.ops._kernels'
#12
bibekyess
closed
7 months ago
7
sgmv_cutlass calculate wrong output
#11
harryhan618
opened
7 months ago
8
[FIX] Benchmark build script
#10
jeromeku
closed
7 months ago
1
License to use this library?
#9
Algy
closed
7 months ago
4
Error installing package
#8
jkl375
closed
7 months ago
7
Why punica is faster than pretrained LLM?
#7
shushengyuan
closed
7 months ago
4
Question about the work
#6
Gorluxor
closed
7 months ago
2
Add support for running on Colab
#5
dzlab
opened
7 months ago
4
Update README.md
#4
eltociear
closed
7 months ago
1
does the Loras all trained start from a base model.
#3
lucasjinreal
closed
7 months ago
1
Next