issues
search
microsoft
/
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.76k
stars
163
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Run pydantic 2 tests with updated DeepSpeed branch
#500
loadams
closed
17 hours ago
0
[QUERY] Expert Parallelism Supported?
#498
Shamauk
opened
6 days ago
0
Attempting to flush sequence N which does not exist
#497
aagontuk
opened
1 week ago
0
Compute perplexity
#496
Sh1gechan
opened
1 week ago
0
Configure server log level
#495
sedletsky-f5
opened
1 week ago
2
few questions regarding the implementation of streaming and batching
#494
KimMinSang96
opened
2 weeks ago
0
Add explanations of MII code into comments
#493
mrwyattii
closed
1 day ago
0
Remove Conversation from MII as it was deprecated and removed from transformers.
#492
loadams
closed
4 days ago
1
Always Flush UIDs after Exceptions
#491
weiqisun
closed
2 days ago
0
Always Flush UIDs after `GeneratorReply`
#490
weiqisun
closed
3 weeks ago
1
[BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request`
#489
weiqisun
opened
3 weeks ago
1
support stream
#488
ZZhangxian
opened
1 month ago
0
support Qwen1.5
#487
ZZhangxian
opened
1 month ago
0
support Qwen
#486
ZZhangxian
closed
1 day ago
0
Some fixes to make openai entrypoint work out of the box
#485
svaruag
opened
1 month ago
0
Reuse KV cache of prefixes
#484
tohtana
opened
1 month ago
0
Support LLava next stronger
#483
thesby
opened
1 month ago
0
How can I use the same prompt to produce the same text output as vllm
#482
Greatpanc
opened
1 month ago
0
Tf32 support
#481
Chasapas
opened
1 month ago
0
Enable streaming option in the OpenAI API server
#480
adk9
opened
1 month ago
0
DeepSpeed-MII 能加载量化的int4或者int8的模型吗?
#479
wangyongpenga
opened
1 month ago
0
Fix deprecation warning on escaped characters
#478
loadams
closed
1 month ago
0
Does deepspeed-mii support prefix_allowed_tokens_fn?
#477
zcakzhuu
opened
1 month ago
0
Update mistral tests to fully open source version.
#476
loadams
closed
1 month ago
0
[REQUEST] LLAMA-3 support
#475
MRYingLEE
opened
1 month ago
0
[REQUEST] Mixtral-8x22B support
#474
y-live-koba
opened
1 month ago
0
Allow model to generate added tokens - fix generation issue in Llama3 models
#473
weiqisun
closed
1 day ago
9
Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7
#472
joeking11829
opened
1 month ago
2
BUG in run_batch_processing
#471
zhihui96
opened
1 month ago
0
fix max_ragged_sequence_count check in _schedule_prompts
#470
dc3671
closed
1 month ago
1
ValueError: Unsupported model type phi3
#469
abpani
opened
2 months ago
0
error when using Qwen1.5-32B
#468
puppet101
opened
2 months ago
0
Performance with vllm
#467
littletomatodonkey
opened
2 months ago
0
[Problem]errno: 98 - Address already in use
#466
littletomatodonkey
closed
2 months ago
0
Only running one replica even though setting many replicas
#465
thesby
opened
2 months ago
0
RuntimeError: The server socket has failed to listen on any local network address
#464
thesby
opened
2 months ago
1
[FEATURE] Access to logits and final hidden layer
#463
lshamis
opened
2 months ago
1
How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ?
#462
wenyangchou
opened
2 months ago
0
Update create a PR workflow to latest version withh node js 20 fixes
#461
loadams
closed
2 months ago
0
How do I launch the api on a graphics card other than cuda: 0
#460
Stark-zheng
opened
2 months ago
1
Is openai compatible server still working?
#459
RobinQu
opened
2 months ago
1
how can I use deepspeed to split the model to submit GPU?
#458
WanBenLe
opened
2 months ago
0
[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII
#457
freQuensy23-coder
opened
3 months ago
1
Update GH workflow and workflow runner requirements.
#456
loadams
closed
3 months ago
0
Add support for DBRX
#455
azaccor
opened
3 months ago
0
Any plans for produnction-ready services?
#454
SeungminHeo
opened
3 months ago
0
Limit VRAM usage in serving the model
#453
risedangel
opened
3 months ago
2
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
#452
Andronixs
opened
3 months ago
6
pydantic V2 support
#451
risedangel
closed
3 months ago
0
How can i use this library with langchain or llama_index?
#450
risedangel
opened
3 months ago
2
Next