microsoft DeepSpeed-MII issues

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Apache License 2.0

1.91k stars 175 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Configure server log level

#495 sedletsky-f5 opened 5 months ago
2
few questions regarding the implementation of streaming and batching

#494 KimMinSang96 opened 5 months ago
0
Add explanations of MII code into comments

#493 mrwyattii closed 4 months ago
0
Remove Conversation from MII as it was deprecated and removed from transformers.

#492 loadams closed 4 months ago
1
Always Flush UIDs after Exceptions

#491 weiqisun closed 4 months ago
0
Always Flush UIDs after `GeneratorReply`

#490 weiqisun closed 5 months ago
1
[BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request`

#489 weiqisun closed 4 months ago
2
support stream

#488 ZZhangxian opened 5 months ago
0
support Qwen1.5

#487 ZZhangxian opened 5 months ago
0
support Qwen

#486 ZZhangxian closed 4 months ago
0
Some fixes to make openai entrypoint work out of the box

#485 svaruag closed 2 months ago
0
Reuse KV cache of prefixes

#484 tohtana opened 5 months ago
0
Support LLava next stronger

#483 thesby opened 6 months ago
0
How can I use the same prompt to produce the same text output as vllm

#482 Greatpanc opened 6 months ago
0
Tf32 support

#481 Chasapas opened 6 months ago
0
Enable streaming option in the OpenAI API server

#480 adk9 closed 2 months ago
0
DeepSpeed-MII 能加载量化的int4或者int8的模型吗？

#479 wangyongpenga opened 6 months ago
0
Fix deprecation warning on escaped characters

#478 loadams closed 6 months ago
0
Does deepspeed-mii support prefix_allowed_tokens_fn?

#477 zcakzhuu opened 6 months ago
0
Update mistral tests to fully open source version.

#476 loadams closed 6 months ago
0
[REQUEST] LLAMA-3 support

#475 MRYingLEE opened 6 months ago
0
[REQUEST] Mixtral-8x22B support

#474 y-live-koba opened 6 months ago
0
Allow model to generate added tokens - fix generation issue in Llama3 models

#473 weiqisun closed 4 months ago
9
Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7

#472 joeking11829 opened 6 months ago
3
BUG in run_batch_processing

#471 zhihui96 opened 6 months ago
0
fix max_ragged_sequence_count check in _schedule_prompts

#470 dc3671 closed 6 months ago
1
ValueError: Unsupported model type phi3

#469 abpani opened 6 months ago
1
error when using Qwen1.5-32B

#468 puppet101 opened 6 months ago
1
Performance with vllm

#467 littletomatodonkey opened 7 months ago
1
[Problem]errno: 98 - Address already in use

#466 littletomatodonkey closed 7 months ago
0
Only running one replica even though setting many replicas

#465 thesby opened 7 months ago
1
RuntimeError: The server socket has failed to listen on any local network address

#464 thesby opened 7 months ago
2
[FEATURE] Access to logits and final hidden layer

#463 lshamis opened 7 months ago
1
How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ？

#462 wenyangchou opened 7 months ago
0
Update create a PR workflow to latest version withh node js 20 fixes

#461 loadams closed 7 months ago
0
How do I launch the api on a graphics card other than cuda: 0

#460 Stark-zheng opened 7 months ago
1
Is openai compatible server still working?

#459 RobinQu closed 2 months ago
1
how can I use deepspeed to split the model to submit GPU?

#458 WanBenLe opened 7 months ago
0
[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII

#457 freQuensy23-coder opened 7 months ago
1
Update GH workflow and workflow runner requirements.

#456 loadams closed 7 months ago
0
Add support for DBRX

#455 azaccor opened 7 months ago
0
Any plans for produnction-ready services?

#454 SeungminHeo opened 7 months ago
0
Limit VRAM usage in serving the model

#453 risedangel opened 7 months ago
2
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

#452 Andronixs opened 7 months ago
6
pydantic V2 support

#451 risedangel closed 7 months ago
0
How can i use this library with langchain or llama_index?

#450 risedangel opened 7 months ago
2
Block when Call client inference in multiprocessing.Process

#449 zhaotyer opened 7 months ago
3
I can't tell from documentation if we're meant to use a chat template or if it's automatically implemented?

#448 sidagarwal2805 opened 7 months ago
0
Update pyzmq in requirements.txt

#447 ccoulombe closed 7 months ago
0
Cohere's Command-R model support

#446 gottlike opened 8 months ago
1

Previous Next