issues
search
huggingface
/
accelerate
๐ A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.97k
stars
970
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Problem with metrics calculation and dataloader
#3202
gssriram
opened
3 weeks ago
0
What should I pass to fsdp_config.fsdp_transformer_layer_cls_to_wrap argument in the yaml file?
#3201
ShengYun-Peng
closed
2 weeks ago
3
Cuda OOM when accelerator.prepare
#3200
antoinedelplace
opened
4 weeks ago
3
Eval loss spikes after resuming from training with DeepSpeed Zero stage 2
#3199
jubueche
closed
3 weeks ago
3
eliminate dead code
#3198
statelesshz
closed
3 weeks ago
3
using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half.
#3197
PMPBinZhang
opened
4 weeks ago
0
Update transformers.deepspeed references from transformers 4.46.0 release
#3196
loadams
closed
4 weeks ago
2
Possible issue in Accelerate FSDP Documentation
#3195
Quicksilver466
opened
4 weeks ago
1
๐จ ๐จ ๐จ Goodbye Python 3.8! ๐จ ๐จ ๐จ
#3194
muellerzr
closed
4 weeks ago
1
Give example on how to handle gradient accumulation with cross-entropy
#3193
ylacombe
opened
4 weeks ago
1
[MLU] update deepspeed-mlu dependency
#3192
Andy666G
opened
4 weeks ago
1
Fix typo
#3191
kylesayrs
closed
1 month ago
1
How to save the optimizer state while enabling Deepspeed to save the model
#3190
ITerydh
closed
3 weeks ago
2
load_and_quantize_model is broken
#3189
eljandoubi
closed
4 weeks ago
3
[Utils] `has_offloaded_params`
#3188
kylesayrs
closed
1 month ago
5
MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu
#3187
huismiling
closed
4 weeks ago
4
fix bnb
#3186
eljandoubi
closed
1 month ago
1
Adds a `multiply_grads` akin to fairseq
#3185
muellerzr
closed
1 month ago
3
Unable to access model gradients with DeepSpeed and Accelerate
#3184
shouyezhe
opened
1 month ago
3
docs: fix a wrong word in comment in src/accelerate/accelerate.py:1255
#3183
Rebornix-zero
closed
1 month ago
1
accelerator.prepare() get OOM,but available in single GPU
#3182
lqf0624
opened
1 month ago
2
[docs] update neptune API
#3181
faaany
closed
1 month ago
1
[Bug] The clip_grad_norm of xla fsdp is not right
#3180
hanwen-sun
opened
1 month ago
1
Distributed inference example for llava_next
#3179
VladOS95-cyber
opened
1 month ago
3
Why don't ML engineers use shampoo ?๐งด
#3178
G-structure
opened
1 month ago
3
Split_batches argument in Accelerator.__init__ is available, but not used
#3177
yaraksen
opened
1 month ago
1
MPI on CPU-only: "no support for _allgather_base"
#3176
tikhu
opened
1 month ago
1
Support `--standalone` for concurrent single node multi-GPU jobs
#3175
Olive-Z
opened
1 month ago
1
update Megatron-LM plugin code to version 0.8.0 or higher.
#3174
eljandoubi
closed
4 weeks ago
4
feat: support tensor parallel using Pytorch 2.0 & Data loader
#3173
kmehant
opened
1 month ago
6
Can I load model once or dataset once and copy to subprocess?
#3172
Hans-digit
opened
1 month ago
2
[Bug] The Transformer Engine plugin seems to be incompatible with LayerNorm that has no weights.
#3171
IDKiro
opened
1 month ago
2
[BUG] Accelerate 1.0.1 failed to train multiple zero-3 models
#3170
nrailg
closed
3 hours ago
3
[Bug] accelerate ignores `TPU`
#3169
steveepreston
opened
1 month ago
3
dataloader doesn't load data while gpu is training
#3168
geekifan
closed
1 week ago
4
take `torch.nn.Module` model into account when moving to device
#3167
faaany
closed
3 weeks ago
4
[docs] add xpu part and fix bug in `torchrun`
#3166
faaany
closed
3 weeks ago
2
fix version check bug in `get_xpu_available_memory`
#3165
faaany
closed
1 month ago
1
add use_all_gather for option
#3164
SangbumChoi
opened
1 month ago
5
add the missing xpu for local sgd
#3163
faaany
closed
3 weeks ago
2
[WIP] make sure _from_accelerator is used with AcceleratorState when called from Accelerator
#3162
winglian
closed
1 month ago
0
๐ [Feature] Add deprecated Decorator
#3161
yhna940
opened
1 month ago
8
๐ [HotFix] Fix dev environment for CPU Docker
#3160
yhna940
opened
1 month ago
2
enable cpu bnb distributed lora finetune
#3159
jiqing-feng
closed
1 month ago
3
save_state() and load_state() do not work correctly with multi-gpu with shuffle=True in dataloader
#3158
isayoften
closed
3 days ago
1
[docs] use nn.module instead of tensor as model
#3157
faaany
closed
1 month ago
2
how to load model with fp8 precision for inference?
#3156
imrankh46
opened
1 month ago
2
Remove broken dynamo test
#3155
oraluben
closed
1 month ago
0
Models With Tied Weights Need Re-Tieing After FSDP Param Init
#3154
fabianlim
closed
3 weeks ago
5
loading big models into memory
#3153
werruww
closed
4 days ago
21
Previous
Next