issues
search
microsoft
/
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.91k
stars
175
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Configure server log level
#495
sedletsky-f5
opened
5 months ago
2
few questions regarding the implementation of streaming and batching
#494
KimMinSang96
opened
5 months ago
0
Add explanations of MII code into comments
#493
mrwyattii
closed
4 months ago
0
Remove Conversation from MII as it was deprecated and removed from transformers.
#492
loadams
closed
4 months ago
1
Always Flush UIDs after Exceptions
#491
weiqisun
closed
4 months ago
0
Always Flush UIDs after `GeneratorReply`
#490
weiqisun
closed
5 months ago
1
[BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request`
#489
weiqisun
closed
4 months ago
2
support stream
#488
ZZhangxian
opened
5 months ago
0
support Qwen1.5
#487
ZZhangxian
opened
5 months ago
0
support Qwen
#486
ZZhangxian
closed
4 months ago
0
Some fixes to make openai entrypoint work out of the box
#485
svaruag
closed
2 months ago
0
Reuse KV cache of prefixes
#484
tohtana
opened
5 months ago
0
Support LLava next stronger
#483
thesby
opened
6 months ago
0
How can I use the same prompt to produce the same text output as vllm
#482
Greatpanc
opened
6 months ago
0
Tf32 support
#481
Chasapas
opened
6 months ago
0
Enable streaming option in the OpenAI API server
#480
adk9
closed
2 months ago
0
DeepSpeed-MII 能加载量化的int4或者int8的模型吗?
#479
wangyongpenga
opened
6 months ago
0
Fix deprecation warning on escaped characters
#478
loadams
closed
6 months ago
0
Does deepspeed-mii support prefix_allowed_tokens_fn?
#477
zcakzhuu
opened
6 months ago
0
Update mistral tests to fully open source version.
#476
loadams
closed
6 months ago
0
[REQUEST] LLAMA-3 support
#475
MRYingLEE
opened
6 months ago
0
[REQUEST] Mixtral-8x22B support
#474
y-live-koba
opened
6 months ago
0
Allow model to generate added tokens - fix generation issue in Llama3 models
#473
weiqisun
closed
4 months ago
9
Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7
#472
joeking11829
opened
6 months ago
3
BUG in run_batch_processing
#471
zhihui96
opened
6 months ago
0
fix max_ragged_sequence_count check in _schedule_prompts
#470
dc3671
closed
6 months ago
1
ValueError: Unsupported model type phi3
#469
abpani
opened
6 months ago
1
error when using Qwen1.5-32B
#468
puppet101
opened
6 months ago
1
Performance with vllm
#467
littletomatodonkey
opened
7 months ago
1
[Problem]errno: 98 - Address already in use
#466
littletomatodonkey
closed
7 months ago
0
Only running one replica even though setting many replicas
#465
thesby
opened
7 months ago
1
RuntimeError: The server socket has failed to listen on any local network address
#464
thesby
opened
7 months ago
2
[FEATURE] Access to logits and final hidden layer
#463
lshamis
opened
7 months ago
1
How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ?
#462
wenyangchou
opened
7 months ago
0
Update create a PR workflow to latest version withh node js 20 fixes
#461
loadams
closed
7 months ago
0
How do I launch the api on a graphics card other than cuda: 0
#460
Stark-zheng
opened
7 months ago
1
Is openai compatible server still working?
#459
RobinQu
closed
2 months ago
1
how can I use deepspeed to split the model to submit GPU?
#458
WanBenLe
opened
7 months ago
0
[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII
#457
freQuensy23-coder
opened
7 months ago
1
Update GH workflow and workflow runner requirements.
#456
loadams
closed
7 months ago
0
Add support for DBRX
#455
azaccor
opened
7 months ago
0
Any plans for produnction-ready services?
#454
SeungminHeo
opened
7 months ago
0
Limit VRAM usage in serving the model
#453
risedangel
opened
7 months ago
2
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
#452
Andronixs
opened
7 months ago
6
pydantic V2 support
#451
risedangel
closed
7 months ago
0
How can i use this library with langchain or llama_index?
#450
risedangel
opened
7 months ago
2
Block when Call client inference in multiprocessing.Process
#449
zhaotyer
opened
7 months ago
3
I can't tell from documentation if we're meant to use a chat template or if it's automatically implemented?
#448
sidagarwal2805
opened
7 months ago
0
Update pyzmq in requirements.txt
#447
ccoulombe
closed
7 months ago
0
Cohere's Command-R model support
#446
gottlike
opened
8 months ago
1
Previous
Next