microsoft DeepSpeedExamples issues

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Apache License 2.0

6.02k stars 1.02k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[inference benchmark] update AML kwargs to match vLLM kwargs

#876 mrwyattii closed 6 months ago
0
Improve robustness of infernece AML benchmark

#875 HeyangQin closed 6 months ago
0
Fix AML benchmark E2E measurment

#874 mrwyattii closed 6 months ago
0
Add LoRA optimization to the SD training example

#873 PareesaMS opened 6 months ago
0
Replace deprecated transformers.deepspeed module

#872 HollowMan6 opened 7 months ago
0
Xiaoxia/fp v1

#871 xiaoxiawu-microsoft closed 6 months ago
0
Remove AML key from args dict when saving results

#870 lekurile closed 7 months ago
0
Inference Benchmark: Catch AML error response

#869 mrwyattii closed 7 months ago
0
Update Inference Benchmarking Scripts - Support AML

#868 lekurile closed 7 months ago
1
[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)

#867 allanj opened 7 months ago
3
[BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?

#866 foin6 opened 7 months ago
2
Extend FastGen benchmark to use AML endpoints

#865 mrwyattii closed 7 months ago
0
zero3 and enable hybrid engine are not suitable for llama2, how to solve it?

#864 terence1023 opened 7 months ago
3
<fill-mask>Modify codes so that different accelerators can be called according to specific device conditions

#863 foin6 closed 7 months ago
1
Fix path in human-eval example README

#862 lekurile closed 7 months ago
0
RLHF problems when using Qwen model

#861 128Ghe980 opened 7 months ago
1
Codellama finetune

#860 nani1149 opened 7 months ago
0
Different accelerators can be called according to specific device conditions

#859 foin6 closed 7 months ago
0
Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

#858 goelayu opened 8 months ago
0
Not a bug, just missing a space in README.md

#857 stceum closed 8 months ago
0
Add Human Eval Example

#856 lekurile closed 7 months ago
0
The inaccurate flop results after several rounds

#855 BitCalSaul opened 8 months ago
1
Fix extraneous arg to MOE example

#854 yang closed 8 months ago
4
Control the kernel injection with new argument. And compare the outputs only on rank 0

#853 foin6 closed 8 months ago
6
remove redundant code

#852 ilml opened 8 months ago
0
Generalize MII benchmark for any model

#851 mrwyattii closed 8 months ago
0
How to resume Deepspeed-Chat RLHF step-3 training?

#850 DespairL closed 8 months ago
0
Question: Why not padding to the same sequence length within the batch during the sft training phase?

#849 LKLKyy opened 8 months ago
0
Remove hardcoded model dependencies in benchmark script

#848 arashb closed 8 months ago
0
running gpt2-xl/test_tune.sh fails - ParquetConfig.__init__() got an unexpected keyword argument 'token'

#847 ccruttjr closed 8 months ago
0
Enable overlap_comm for better performance

#846 li-plus closed 2 weeks ago
0
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, remote process exited or there was a network error, NCCL version 2.18.6

#845 Rainbowman0 opened 8 months ago
3
Modify codes so that different accelerators can be called according to specific device conditions

#844 foin6 closed 8 months ago
1
[Example] Refactor and Polish Cifar10-DeepSpeed Code Example.

#843 keli-wen closed 8 months ago
3
Step3 hanging for a long time

#842 Jeayea closed 9 months ago
1
[DeepSpeed-Chat] Fix OOM issue in dataloader

#841 youkaichao opened 9 months ago
2
Invalidate trace cache @ step 0: expected module 0, but got module 6

#840 boundles opened 9 months ago
0
deepspeed-chat: Support zero3 params initialization in the last LN

#839 deepcharm closed 8 months ago
0
fix: typo in sa

#838 A-Cepheus closed 9 months ago
0
Update MII Inference Examples

#837 mrwyattii closed 8 months ago
0
Step3 PPO print error when enable --print_answers

#836 tonylin52 closed 9 months ago
1
async_pipeline is not exposed in the library

#835 yaliqin opened 9 months ago
1
fix: don't add eot token if add_eot_token knob is False

#834 EeyoreLee opened 9 months ago
0
Improve Comms Benchmark Timing

#833 Quentin-Anthony closed 9 months ago
5
Mistral and Orca Training

#832 syngokhan opened 9 months ago
0
[Discussion] Can anyone show the performance on every step with any dataset

#831 EeyoreLee opened 9 months ago
0
Question: Why did you implemented LoRA on your hand instead of using peft?

#830 kwonmha opened 9 months ago
1
运行e2e_rlhf时报错

#829 Sun-9923 closed 9 months ago
0
Add DPO support for DeepSpeed-Chat

#828 stceum opened 9 months ago
1
Update README.md

#827 chinainfant closed 9 months ago
0

Previous Next