issues
search
vectorch-ai
/
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
377
stars
28
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
ut: add more tests for different warp layout
#340
guocuimi
opened
1 week ago
0
misc: read flashinfer kernel code and add comments
#339
guocuimi
opened
2 weeks ago
0
[misc] read flashinfer kernel code and add comments
#338
guocuimi
closed
2 weeks ago
0
ci: added pip cache to avoid redownloading
#337
guocuimi
closed
3 weeks ago
0
ut: added fp8 kv unittests for flash infer kernel
#336
guocuimi
closed
3 weeks ago
0
refactor: move paged kv related logic into paged_kv_t
#335
guocuimi
closed
3 weeks ago
0
feat: added pass-in alibi slopes support for flash infer kernel
#334
guocuimi
closed
3 weeks ago
0
refactor: replaced last_page_len with kv_indptr for flash infer kernel
#333
guocuimi
closed
3 weeks ago
0
ut: added unittests for flash infer kernels
#332
guocuimi
closed
3 weeks ago
0
kernel: port flash infer handler + wrapper logics
#331
guocuimi
closed
3 weeks ago
0
refactor: move flash attn and flash infer into attention folder
#330
guocuimi
closed
3 weeks ago
0
kernel: added script to generate instantiation for flashinfer kernels
#329
guocuimi
closed
4 weeks ago
0
refactor: flatten block tables to 1d tensor
#328
guocuimi
closed
4 weeks ago
0
kernel: added flash infer attention impl
#327
guocuimi
closed
4 weeks ago
0
feat: fix and use marlin kernel for awq by default
#326
guocuimi
closed
4 weeks ago
0
refactor: added static switch for marlin kernel dispatch
#325
guocuimi
closed
1 month ago
0
fix: put item into asyncio.Queue in a thread-safe way
#324
guocuimi
closed
1 month ago
0
Will the result callback called in a threadsafe/coruntine safe way? #322
#323
tp-nan
closed
1 month ago
7
ci: allow build without requiring a physical gpu device
#321
guocuimi
closed
1 month ago
0
cmake: make includes private and disable jinja2cpp build
#320
guocuimi
closed
1 month ago
0
fix: clean up build warnings: "LOG" redefined
#319
guocuimi
closed
1 month ago
0
refactor: clean up build warnings and refactor marlin kernels
#318
guocuimi
closed
1 month ago
0
test: added unittests for marlin kernels
#317
guocuimi
closed
1 month ago
0
build: speed up compilation for marlin kernels
#316
guocuimi
closed
1 month ago
0
feat: added awq marlin qlinear
#315
guocuimi
closed
1 month ago
0
kernel: port awq repack kernel
#314
guocuimi
closed
1 month ago
0
feat: added fused column parallel linear
#313
guocuimi
closed
1 month ago
0
feat: added gptq marlin qlinear layer
#312
guocuimi
closed
1 month ago
0
refactor: remove the logic loading individual weight from shared partitions
#311
guocuimi
closed
1 month ago
0
RuntimeError: Timed out
#310
spongxin
opened
1 month ago
1
rust: upgrade rust libs to latest version
#309
guocuimi
closed
1 month ago
0
Mistral large GPTQ model inference problem
#308
drdaliang
opened
1 month ago
3
kernel: port gptq marlin kernel and fp8 marlin kernel
#307
guocuimi
closed
1 month ago
0
refactor: move models to upper folder
#306
guocuimi
closed
1 month ago
0
fix: move eos out of stop token list to honor ignore_eos option
#305
guocuimi
closed
1 month ago
0
The process terminated before reaching the specified max_tokens after setting ignore_ros=True and max_tokens.
#304
HowardChenRV
closed
1 month ago
3
feat: added marlin qlinear support
#303
guocuimi
opened
1 month ago
0
test: added unittests for marlin fp16xint4 gemm
#302
guocuimi
closed
1 month ago
0
kernel: support kernel test in python via pybind
#301
guocuimi
closed
1 month ago
0
model: added gemma2 with softcap and sliding window support
#300
guocuimi
closed
1 month ago
0
test: added unittests for attention sliding window
#299
guocuimi
closed
1 month ago
0
kernel: port softcap support for flash attention
#298
guocuimi
closed
1 month ago
0
ci: fix pytest version to avoid flakiness
#297
guocuimi
closed
2 months ago
0
feat: added sliding window support for QWen2
#296
guocuimi
closed
2 months ago
0
model: added qwen2 support
#295
guocuimi
closed
2 months ago
0
triton: fix build error and add example with unittest
#294
guocuimi
closed
2 months ago
0
fix: handle unfinished utf8 bytes for tiktoken tokenizer
#293
guocuimi
closed
2 months ago
0
feat: added THUDM/glm-4* support
#292
guocuimi
closed
2 months ago
0
Deployment of glm-4-9b-chat model fails with SentencePiece tokenizer error
#291
dengyingxu
closed
2 months ago
4
ci: added clang-format-ignore file to exclude generated files
#290
guocuimi
closed
2 months ago
0
Next