issues
search
microsoft
/
DeepSpeedExamples
Example models using DeepSpeed
Apache License 2.0
6.02k
stars
1.02k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[inference benchmark] update AML kwargs to match vLLM kwargs
#876
mrwyattii
closed
6 months ago
0
Improve robustness of infernece AML benchmark
#875
HeyangQin
closed
6 months ago
0
Fix AML benchmark E2E measurment
#874
mrwyattii
closed
6 months ago
0
Add LoRA optimization to the SD training example
#873
PareesaMS
opened
6 months ago
0
Replace deprecated transformers.deepspeed module
#872
HollowMan6
opened
7 months ago
0
Xiaoxia/fp v1
#871
xiaoxiawu-microsoft
closed
6 months ago
0
Remove AML key from args dict when saving results
#870
lekurile
closed
7 months ago
0
Inference Benchmark: Catch AML error response
#869
mrwyattii
closed
7 months ago
0
Update Inference Benchmarking Scripts - Support AML
#868
lekurile
closed
7 months ago
1
[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
#867
allanj
opened
7 months ago
3
[BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
#866
foin6
opened
7 months ago
2
Extend FastGen benchmark to use AML endpoints
#865
mrwyattii
closed
7 months ago
0
zero3 and enable hybrid engine are not suitable for llama2, how to solve it?
#864
terence1023
opened
7 months ago
3
<fill-mask>Modify codes so that different accelerators can be called according to specific device conditions
#863
foin6
closed
7 months ago
1
Fix path in human-eval example README
#862
lekurile
closed
7 months ago
0
RLHF problems when using Qwen model
#861
128Ghe980
opened
7 months ago
1
Codellama finetune
#860
nani1149
opened
7 months ago
0
Different accelerators can be called according to specific device conditions
#859
foin6
closed
7 months ago
0
Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
#858
goelayu
opened
8 months ago
0
Not a bug, just missing a space in README.md
#857
stceum
closed
8 months ago
0
Add Human Eval Example
#856
lekurile
closed
7 months ago
0
The inaccurate flop results after several rounds
#855
BitCalSaul
opened
8 months ago
1
Fix extraneous arg to MOE example
#854
yang
closed
8 months ago
4
Control the kernel injection with new argument. And compare the outputs only on rank 0
#853
foin6
closed
8 months ago
6
remove redundant code
#852
ilml
opened
8 months ago
0
Generalize MII benchmark for any model
#851
mrwyattii
closed
8 months ago
0
How to resume Deepspeed-Chat RLHF step-3 training?
#850
DespairL
closed
8 months ago
0
Question: Why not padding to the same sequence length within the batch during the sft training phase?
#849
LKLKyy
opened
8 months ago
0
Remove hardcoded model dependencies in benchmark script
#848
arashb
closed
8 months ago
0
running gpt2-xl/test_tune.sh fails - ParquetConfig.__init__() got an unexpected keyword argument 'token'
#847
ccruttjr
closed
8 months ago
0
Enable overlap_comm for better performance
#846
li-plus
closed
2 weeks ago
0
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, remote process exited or there was a network error, NCCL version 2.18.6
#845
Rainbowman0
opened
8 months ago
3
Modify codes so that different accelerators can be called according to specific device conditions
#844
foin6
closed
8 months ago
1
[Example] Refactor and Polish Cifar10-DeepSpeed Code Example.
#843
keli-wen
closed
8 months ago
3
Step3 hanging for a long time
#842
Jeayea
closed
9 months ago
1
[DeepSpeed-Chat] Fix OOM issue in dataloader
#841
youkaichao
opened
9 months ago
2
Invalidate trace cache @ step 0: expected module 0, but got module 6
#840
boundles
opened
9 months ago
0
deepspeed-chat: Support zero3 params initialization in the last LN
#839
deepcharm
closed
8 months ago
0
fix: typo in sa
#838
A-Cepheus
closed
9 months ago
0
Update MII Inference Examples
#837
mrwyattii
closed
8 months ago
0
Step3 PPO print error when enable --print_answers
#836
tonylin52
closed
9 months ago
1
async_pipeline is not exposed in the library
#835
yaliqin
opened
9 months ago
1
fix: don't add eot token if add_eot_token knob is False
#834
EeyoreLee
opened
9 months ago
0
Improve Comms Benchmark Timing
#833
Quentin-Anthony
closed
9 months ago
5
Mistral and Orca Training
#832
syngokhan
opened
9 months ago
0
[Discussion] Can anyone show the performance on every step with any dataset
#831
EeyoreLee
opened
9 months ago
0
Question: Why did you implemented LoRA on your hand instead of using peft?
#830
kwonmha
opened
9 months ago
1
运行e2e_rlhf时报错
#829
Sun-9923
closed
9 months ago
0
Add DPO support for DeepSpeed-Chat
#828
stceum
opened
9 months ago
1
Update README.md
#827
chinainfant
closed
9 months ago
0
Previous
Next