issues
search
microsoft
/
DeepSpeedExamples
Example models using DeepSpeed
Apache License 2.0
5.83k
stars
987
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Consult the first phase.
#909
csxrzhang
opened
2 days ago
0
an error with gradient checkpointing in DeepspeedChat step 3
#908
wangyuwen1999
opened
5 days ago
0
单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错
#907
Dakai798
opened
1 week ago
0
DeepSpeed-Chat step-1 hanging for a long time
#906
lemon-little
opened
1 week ago
0
Enable cpu/xpu support for the benchmarking suite
#905
louie-tsai
opened
1 month ago
7
CPU OOM when inferencing Llama3-70B-Chinese-Chat
#904
GORGEOUSLCX
opened
1 month ago
0
cannot pickle 'Stream' object
#903
teis-e
opened
1 month ago
0
can not run the test-gpt.sh because of assertionError
#902
leachee99
opened
1 month ago
0
请问fastgen 是否支持长文本和序列并行推理
#901
AceCoder0
opened
1 month ago
0
Add --client-only arg to mii benchmark
#900
delock
closed
3 weeks ago
0
Refactored LLM benchmark code
#899
mrwyattii
closed
1 week ago
0
fix bug with queue.empty not being reliable
#898
mrwyattii
closed
2 months ago
0
Update tokens_per_sec calculation to work w/ stream and non-stream cases
#897
lekurile
closed
2 months ago
0
run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely
#896
awan-10
opened
2 months ago
8
updating tokens per second to include the token count of generated tokens.
#895
guptha23
closed
2 months ago
0
[Error] AutoTune: `connect to host localhost port 22: Connection refused`
#894
wqw547243068
opened
2 months ago
0
How to use deepspeed for multi-node and multi-card task in slurm cluster
#893
dshwei
opened
2 months ago
0
Does Zero-Inference support TP?
#892
preminstrel
opened
2 months ago
11
extend max_prompt_length and input text for 128k evaluation
#891
HeyangQin
opened
2 months ago
0
Deepspeed support finetune extra model with lora ?
#890
wanghongqu
opened
2 months ago
1
不同机器上python环境变量路径不同,deepspeed启动后发现找不到其他机器的python环境,如何解决
#889
liqwertyu
opened
2 months ago
0
when calculating actor loss, why the mask is "action_mask[:, start: ] "
#888
fancghit
closed
2 months ago
0
The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
#887
mousewu
opened
2 months ago
0
About multiple-thread attention computation on CPU using zero-inference example.
#886
luckyq
opened
2 months ago
0
Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
#885
wenbozhangjs
opened
3 months ago
0
[REQUEST] More fine-grained distributed strategies for RLHF training
#884
youshaox
opened
3 months ago
0
The reward value did not increase.
#883
Sun-Shiqi
opened
3 months ago
1
Fix response check in call_aml function
#882
HeyangQin
closed
3 months ago
0
Update throughput-latency plot script
#881
lekurile
closed
2 months ago
0
[Inference Benchmark] set `num_requests` based on `num_clients`
#880
mrwyattii
closed
3 months ago
0
Confusion about Deepspeed Inference
#879
ZekaiGalaxy
opened
3 months ago
1
`AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed
#878
htjain
opened
3 months ago
0
Benchmark mii stalled and crashed
#877
Albert-Zhao-2020
opened
3 months ago
0
[inference benchmark] update AML kwargs to match vLLM kwargs
#876
mrwyattii
closed
3 months ago
0
Improve robustness of infernece AML benchmark
#875
HeyangQin
closed
3 months ago
0
Fix AML benchmark E2E measurment
#874
mrwyattii
closed
3 months ago
0
Add LoRA optimization to the SD training example
#873
PareesaMS
opened
3 months ago
0
Replace deprecated transformers.deepspeed module
#872
HollowMan6
opened
3 months ago
0
Xiaoxia/fp v1
#871
xiaoxiawu-microsoft
closed
3 months ago
0
Remove AML key from args dict when saving results
#870
lekurile
closed
3 months ago
0
Inference Benchmark: Catch AML error response
#869
mrwyattii
closed
4 months ago
0
Update Inference Benchmarking Scripts - Support AML
#868
lekurile
closed
3 months ago
1
[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
#867
allanj
opened
4 months ago
3
[BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
#866
foin6
opened
4 months ago
2
Extend FastGen benchmark to use AML endpoints
#865
mrwyattii
closed
4 months ago
0
zero3 and enable hybrid engine are not suitable for llama2, how to solve it?
#864
terence1023
opened
4 months ago
2
<fill-mask>Modify codes so that different accelerators can be called according to specific device conditions
#863
foin6
closed
4 months ago
1
Fix path in human-eval example README
#862
lekurile
closed
4 months ago
0
RLHF problems when using Qwen model
#861
128Ghe980
opened
4 months ago
1
Codellama finetune
#860
nani1149
opened
4 months ago
0
Next