huggingface accelerate issues

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

https://huggingface.co/docs/accelerate

Apache License 2.0

7.97k stars 970 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Problem with metrics calculation and dataloader

#3202 gssriram opened 3 weeks ago
0
What should I pass to fsdp_config.fsdp_transformer_layer_cls_to_wrap argument in the yaml file?

#3201 ShengYun-Peng closed 2 weeks ago
3
Cuda OOM when accelerator.prepare

#3200 antoinedelplace opened 4 weeks ago
3
Eval loss spikes after resuming from training with DeepSpeed Zero stage 2

#3199 jubueche closed 3 weeks ago
3
eliminate dead code

#3198 statelesshz closed 3 weeks ago
3
using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half.

#3197 PMPBinZhang opened 4 weeks ago
0
Update transformers.deepspeed references from transformers 4.46.0 release

#3196 loadams closed 4 weeks ago
2
Possible issue in Accelerate FSDP Documentation

#3195 Quicksilver466 opened 4 weeks ago
1
🚨 🚨 🚨 Goodbye Python 3.8! 🚨 🚨 🚨

#3194 muellerzr closed 4 weeks ago
1
Give example on how to handle gradient accumulation with cross-entropy

#3193 ylacombe opened 4 weeks ago
1
[MLU] update deepspeed-mlu dependency

#3192 Andy666G opened 4 weeks ago
1
Fix typo

#3191 kylesayrs closed 1 month ago
1
How to save the optimizer state while enabling Deepspeed to save the model

#3190 ITerydh closed 3 weeks ago
2
load_and_quantize_model is broken

#3189 eljandoubi closed 4 weeks ago
3
[Utils] `has_offloaded_params`

#3188 kylesayrs closed 1 month ago
5
MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu

#3187 huismiling closed 4 weeks ago
4
fix bnb

#3186 eljandoubi closed 1 month ago
1
Adds a `multiply_grads` akin to fairseq

#3185 muellerzr closed 1 month ago
3
Unable to access model gradients with DeepSpeed and Accelerate

#3184 shouyezhe opened 1 month ago
3
docs: fix a wrong word in comment in src/accelerate/accelerate.py:1255

#3183 Rebornix-zero closed 1 month ago
1
accelerator.prepare() get OOM,but available in single GPU

#3182 lqf0624 opened 1 month ago
2
[docs] update neptune API

#3181 faaany closed 1 month ago
1
[Bug] The clip_grad_norm of xla fsdp is not right

#3180 hanwen-sun opened 1 month ago
1
Distributed inference example for llava_next

#3179 VladOS95-cyber opened 1 month ago
3
Why don't ML engineers use shampoo ?🧴

#3178 G-structure opened 1 month ago
3
Split_batches argument in Accelerator.__init__ is available, but not used

#3177 yaraksen opened 1 month ago
1
MPI on CPU-only: "no support for _allgather_base"

#3176 tikhu opened 1 month ago
1
Support `--standalone` for concurrent single node multi-GPU jobs

#3175 Olive-Z opened 1 month ago
1
update Megatron-LM plugin code to version 0.8.0 or higher.

#3174 eljandoubi closed 4 weeks ago
4
feat: support tensor parallel using Pytorch 2.0 & Data loader

#3173 kmehant opened 1 month ago
6
Can I load model once or dataset once and copy to subprocess?

#3172 Hans-digit opened 1 month ago
2
[Bug] The Transformer Engine plugin seems to be incompatible with LayerNorm that has no weights.

#3171 IDKiro opened 1 month ago
2
[BUG] Accelerate 1.0.1 failed to train multiple zero-3 models

#3170 nrailg closed 3 hours ago
3
[Bug] accelerate ignores `TPU`

#3169 steveepreston opened 1 month ago
3
dataloader doesn't load data while gpu is training

#3168 geekifan closed 1 week ago
4
take `torch.nn.Module` model into account when moving to device

#3167 faaany closed 3 weeks ago
4
[docs] add xpu part and fix bug in `torchrun`

#3166 faaany closed 3 weeks ago
2
fix version check bug in `get_xpu_available_memory`

#3165 faaany closed 1 month ago
1
add use_all_gather for option

#3164 SangbumChoi opened 1 month ago
5
add the missing xpu for local sgd

#3163 faaany closed 3 weeks ago
2
[WIP] make sure _from_accelerator is used with AcceleratorState when called from Accelerator

#3162 winglian closed 1 month ago
0
🚀 [Feature] Add deprecated Decorator

#3161 yhna940 opened 1 month ago
8
🚑 [HotFix] Fix dev environment for CPU Docker

#3160 yhna940 opened 1 month ago
2
enable cpu bnb distributed lora finetune

#3159 jiqing-feng closed 1 month ago
3
save_state() and load_state() do not work correctly with multi-gpu with shuffle=True in dataloader

#3158 isayoften closed 3 days ago
1
[docs] use nn.module instead of tensor as model

#3157 faaany closed 1 month ago
2
how to load model with fp8 precision for inference?

#3156 imrankh46 opened 1 month ago
2
Remove broken dynamo test

#3155 oraluben closed 1 month ago
0
Models With Tied Weights Need Re-Tieing After FSDP Param Init

#3154 fabianlim closed 3 weeks ago
5
loading big models into memory

#3153 werruww closed 4 days ago
21

Previous Next