issues
search
neuralmagic
/
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Compressed tensors fp8
#357
robertgshaw2-neuralmagic
closed
4 months ago
0
fix compress sparse linear with bias
#356
yukavio
closed
3 months ago
1
Profiler improvements
#355
LucasWilkinson
closed
4 months ago
0
Generate commit ID in an separate untracked file
#354
tlrmchlsmth
closed
4 months ago
0
Marlin moe act order
#353
ElizaWszola
closed
4 months ago
0
Fix docker upload bugs
#352
dhuangnm
closed
4 months ago
1
[scratch] Fanning Out
#351
andy-neuma
closed
3 months ago
1
Upstream sync 2024 07 01
#350
robertgshaw2-neuralmagic
closed
4 months ago
0
Qwen 2 refactored
#349
robertgshaw2-neuralmagic
closed
4 months ago
0
fix install conflict due to torchvision 0.18.1
#348
dhuangnm
closed
3 months ago
4
Refactor moe
#347
robertgshaw2-neuralmagic
closed
4 months ago
0
Archive current data.js
#346
dbarbuzzi
closed
4 months ago
0
Fix error in nightly docker publish
#345
dhuangnm
closed
4 months ago
1
fixes for nm-get-docker-tag
#344
derekk-nm
closed
4 months ago
1
added fp8
#343
robertgshaw2-neuralmagic
closed
4 months ago
0
Compressed tensors fp8
#342
robertgshaw2-neuralmagic
closed
4 months ago
0
fix nm_get_docker-tags
#341
derekk-nm
closed
4 months ago
0
Expand lm eval testing to many models
#340
robertgshaw2-neuralmagic
closed
3 months ago
1
Benchmarking update - phase 1
#339
dbarbuzzi
closed
4 months ago
0
Enable Release build for nightly/release workflow
#338
dhuangnm
closed
4 months ago
3
nightly patch
#337
andy-neuma
closed
4 months ago
0
[WIP] Fix for FP8 checkpoints with fused scales
#336
mgoin
closed
4 months ago
0
update code coverage configuration
#335
derekk-nm
closed
4 months ago
0
Add docker workflow to nm nightly/release and fixed some minor bugs in wheel uploading
#334
dhuangnm
closed
4 months ago
0
Fix `step-status` script
#333
dbarbuzzi
closed
4 months ago
0
[Bugfix] Update profile example to new add request interface + fix profiler not picking up kernels within cudagraphs
#332
LucasWilkinson
closed
4 months ago
3
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin
#331
mgoin
closed
4 months ago
2
bump version to 0.5.1
#330
dhuangnm
closed
4 months ago
1
Upstream sync 2024 06 23
#329
robertgshaw2-neuralmagic
closed
4 months ago
0
[ CI ] Enable distributed tests
#328
robertgshaw2-neuralmagic
closed
3 months ago
0
add some guards for pypi push
#327
dhuangnm
closed
4 months ago
3
[ CI ] LM Eval Testing Expansion
#326
robertgshaw2-neuralmagic
closed
4 months ago
0
[ CI ] Fan Out Strategy
#325
robertgshaw2-neuralmagic
closed
3 months ago
2
[ CI ] Enable Distributed Tests
#324
robertgshaw2-neuralmagic
closed
4 months ago
0
[ README ] Update README.md
#323
robertgshaw2-neuralmagic
closed
4 months ago
0
Pruned Examples
#322
robertgshaw2-neuralmagic
closed
4 months ago
3
Force-disable upstream tracking
#321
dbarbuzzi
closed
4 months ago
3
revert githash commit
#320
dhuangnm
closed
4 months ago
2
Embed git commit hash into Python source
#319
dbarbuzzi
closed
4 months ago
6
set PYTHON-3-10 job to gcp
#318
derekk-nm
closed
4 months ago
0
[ CI ] skip local_workers_clean_shutdown
#317
robertgshaw2-neuralmagic
closed
4 months ago
0
[CI][Bugfix] Update golden strings for fp8 kv cache test
#316
mgoin
closed
4 months ago
0
cross python whl
#315
andy-neuma
closed
4 months ago
0
[ CI ] Disable Usage Stats in Automation
#314
robertgshaw2-neuralmagic
closed
4 months ago
2
Pruned Readme
#313
robertgshaw2-neuralmagic
closed
4 months ago
0
[ CI ] Fix Failing Test Server Logprobs (tolerance tweak)
#312
robertgshaw2-neuralmagic
closed
4 months ago
0
[ CI ] Fix Failing Magic Wand Test
#311
robertgshaw2-neuralmagic
closed
4 months ago
1
enble tests that require C compiler
#310
andy-neuma
closed
4 months ago
1
Use shared actions
#309
dbarbuzzi
closed
4 months ago
0
Update nm-nightly.yml
#308
derekk-nm
closed
4 months ago
1
Previous
Next