issues
search
microsoft
/
DeepSpeedExamples
Example models using DeepSpeed
Apache License 2.0
6.1k
stars
1.04k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How can I change the master_port when using deepspeed for multi-GPU on single node, i.e. localhost
#936
lovedoubledan
opened
2 days ago
3
RuntimeError: CUDA error: no kernel image is available for execution on the device
#935
mrpeerat
closed
1 week ago
1
No module named 'transformers.deepspeed'
#934
TianyuJIAA
closed
3 weeks ago
2
Fixed mistake in readme
#933
SCheekati
closed
3 weeks ago
0
Does DeepSpeed's Pipeline-Parallelism optimizer supports skip connections?
#932
RoyMahlab
opened
1 month ago
0
[cifar ds training]: Set cuda device during initialization of distributed backend.
#931
jagadish-amd
closed
3 weeks ago
3
Εnable reward model offloading option
#930
kfertakis
closed
3 weeks ago
2
Deepspeed-Domino
#929
zhangsmallshark
closed
2 weeks ago
3
After using steps 1, 2, and 3, the test reply content only replies Assistant: </s>。
#928
jianmomo
closed
2 months ago
0
Remove the fixed `eot_token` mechanism for SFT
#927
Xingfu-Yi
closed
3 weeks ago
2
Update requirements for opencv-python CVE
#925
loadams
closed
2 months ago
0
AttributeError: 'DeepSpeedEngine' object has no attribute 'model',
#924
lovychen
closed
3 weeks ago
1
How to calculate training efficiency ,i.e tokens/sec of step 1 fine tuning of llama2 model ?
#923
sowmya04101998
opened
2 months ago
0
Actor loss nan and Resizing model embedding
#922
ouyanmei
opened
2 months ago
1
DeepNVMe ZeRO-inf Tutorial
#921
jomayeri
closed
2 months ago
0
FileNotFoundError: [Errno 2] No such file or directory: 'numactl'
#920
zhiwentian
closed
2 weeks ago
6
DeepNVMe README.md add xref
#919
stas00
closed
3 months ago
0
Update README.md
#916
keshavkowshik
closed
3 months ago
0
step2 without any response for a long time
#915
asfadfaf
opened
3 months ago
0
DeepNVMe example scripts
#914
tjruwase
closed
3 months ago
0
Add openai client to deepspeedometer
#913
delock
closed
3 months ago
2
Different zero stage the training memory compute
#912
Arcmoon-Hu
opened
4 months ago
0
nvcc fatal : Unsupported gpu architecture 'compute_86' and nvcc fatal : Value 'c++17' is not defined for option 'std'
#911
Xccanxin
closed
4 months ago
1
How to start deepspeed automatically?
#910
qwerfdsadad
closed
2 months ago
2
Consult the first phase.
#909
csxrzhang
closed
3 months ago
2
an error with gradient checkpointing in DeepspeedChat step 3
#908
wangyuwen1999
opened
4 months ago
0
单机多卡进行RLHF在第三步中使用Qwen模型作Actor Model报错
#907
Dakai798
opened
5 months ago
1
DeepSpeed-Chat step-1 hanging for a long time
#906
lemon-little
opened
5 months ago
0
Enable cpu/xpu support for the benchmarking suite
#905
louie-tsai
closed
3 months ago
8
CPU OOM when inferencing Llama3-70B-Chinese-Chat
#904
GORGEOUSLCX
opened
6 months ago
0
cannot pickle 'Stream' object
#903
teis-e
opened
6 months ago
0
can not run the test-gpt.sh because of assertionError
#902
leachee99
opened
6 months ago
0
请问fastgen 是否支持长文本和序列并行推理
#901
AceCoder0
opened
6 months ago
0
Add --client-only arg to mii benchmark
#900
delock
closed
5 months ago
0
Refactored LLM benchmark code
#899
mrwyattii
closed
4 months ago
0
fix bug with queue.empty not being reliable
#898
mrwyattii
closed
6 months ago
0
Update tokens_per_sec calculation to work w/ stream and non-stream cases
#897
lekurile
closed
6 months ago
0
run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely
#896
awan-10
closed
4 months ago
11
updating tokens per second to include the token count of generated tokens.
#895
guptha23
closed
6 months ago
0
[Error] AutoTune: `connect to host localhost port 22: Connection refused`
#894
wqw547243068
opened
7 months ago
0
How to use deepspeed for multi-node and multi-card task in slurm cluster
#893
dshwei
opened
7 months ago
0
Does Zero-Inference support TP?
#892
preminstrel
opened
7 months ago
11
extend max_prompt_length and input text for 128k evaluation
#891
HeyangQin
closed
2 months ago
0
Deepspeed support finetune extra model with lora ?
#890
wanghongqu
opened
7 months ago
1
不同机器上python环境变量路径不同,deepspeed启动后发现找不到其他机器的python环境,如何解决
#889
liqwertyu
closed
2 months ago
0
when calculating actor loss, why the mask is "action_mask[:, start: ] "
#888
fancghit
closed
7 months ago
0
The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
#887
mousewu
opened
7 months ago
1
About multiple-thread attention computation on CPU using zero-inference example.
#886
luckyq
opened
7 months ago
0
Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
#885
wenbozhangjs
opened
7 months ago
0
[REQUEST] More fine-grained distributed strategies for RLHF training
#884
youshaox
opened
7 months ago
0
Next