issues
search
neuralmagic
/
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Marlin moe zero points
#407
ElizaWszola
closed
1 month ago
0
update readme about archival
#406
andy-neuma
closed
2 months ago
0
Group Index Conditioning
#405
kylesayrs
closed
2 months ago
1
[Bug]: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
#404
chintan-ushur
closed
2 months ago
1
moe kernel
#403
dsikka
closed
3 months ago
1
[Misc]: Is Magic Wand SparseBitmaskStorageFormat open sourced?
#402
dhjoo98
closed
3 months ago
1
[WIP, Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel
#401
LucasWilkinson
closed
2 months ago
1
:sparkles: pipe tracing flag
#400
joerunde
closed
3 months ago
1
:white_check_mark: add test for multiprocessing flag
#399
joerunde
closed
3 months ago
1
Fix failed tests
#398
robertgshaw2-neuralmagic
closed
3 months ago
1
stash
#397
robertgshaw2-neuralmagic
closed
3 months ago
1
fix
#396
robertgshaw2-neuralmagic
closed
3 months ago
1
Logit bias
#395
robertgshaw2-neuralmagic
closed
3 months ago
1
Add tokenizer
#394
robertgshaw2-neuralmagic
closed
3 months ago
1
Socket context
#393
joerunde
closed
3 months ago
1
:sparkles: health check round 2
#392
joerunde
closed
3 months ago
1
Await socket operations + some other minor cleanup
#391
njhill
closed
3 months ago
1
Use random port for backend
#390
joerunde
closed
3 months ago
3
Add Open Port
#389
robertgshaw2-neuralmagic
closed
3 months ago
1
Select Open Port For RPC Server
#388
robertgshaw2-neuralmagic
closed
3 months ago
0
Features / Cleanup for MP Frontend
#387
robertgshaw2-neuralmagic
closed
3 months ago
1
[WIP, Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel
#386
LucasWilkinson
closed
3 months ago
2
Add health probe
#385
joerunde
closed
3 months ago
2
Frontend mp flag
#384
joerunde
closed
3 months ago
2
Added asymmetric integration to linear layers
#383
ProExpertProg
closed
3 months ago
1
[Usage]: Do you have any plans to support sparse fp8 kernel and support on rocm?
#382
DehuaTang
closed
3 months ago
1
Refactor weight loading
#381
dsikka
closed
3 months ago
1
Dynamic azp quant kernel
#380
ProExpertProg
closed
3 months ago
1
Update LICENSE
#379
jeanniefinks
closed
3 months ago
1
[ BugFix ] Prompt Logprobs Detokenization (#6223)
#378
robertgshaw2-neuralmagic
closed
3 months ago
0
[ CI ] Upstream sync to `v0.4.3` branch
#377
robertgshaw2-neuralmagic
closed
3 months ago
0
[ CI ] Re-Release 0.4.3 with proper branch
#376
robertgshaw2-neuralmagic
closed
4 months ago
0
extend nightly tests timeout
#375
dhuangnm
closed
4 months ago
0
use single quotes for bash symbol
#374
derekk-nm
closed
4 months ago
0
fix upload assets name
#373
derekk-nm
closed
4 months ago
0
remove loop short circuit
#372
andy-neuma
closed
4 months ago
1
expand benchmark to h100's
#371
andy-neuma
closed
4 months ago
0
NM Profiler : Update visualize_trace.py
#370
varun-sundar-rabindranath
closed
3 months ago
3
make venv input optional
#369
derekk-nm
closed
3 months ago
4
add test coverage table to github summary
#368
derekk-nm
closed
3 months ago
1
use v1.0.0 tag for nm-actions
#367
dhuangnm
closed
4 months ago
2
Upstream sync 2024 07 07
#366
robertgshaw2-neuralmagic
closed
4 months ago
1
Refactor gptq marlin
#365
robertgshaw2-neuralmagic
closed
4 months ago
0
Fix docker build failure in nighlty
#364
dhuangnm
closed
4 months ago
0
Marlin moe combine kernels
#363
ElizaWszola
closed
3 months ago
1
Benchmarking separation
#362
dbarbuzzi
closed
3 months ago
0
upload RELEASE wheel to pypi.org
#361
derekk-nm
closed
4 months ago
1
Comparing two branches in order to summarize changes
#360
afeldman-nm
closed
3 months ago
0
Comparing two branches in order to summarize changes
#359
afeldman-nm
closed
4 months ago
0
Compressed tensors fp8
#358
robertgshaw2-neuralmagic
closed
4 months ago
0
Next