issues
search
huggingface
/
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.97k
stars
970
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Accelerate + FSDP plugin hang on after model save intermediate checkpoint
#3250
leeruibin
opened
6 hours ago
1
examples/inference/pippy/llama.py Assertion error about graphs
#3249
685Degrees
opened
10 hours ago
0
Fix: Resolve #3060
#3248
wejoncy
opened
10 hours ago
0
Use `numpy._core` instead of `numpy.core`
#3247
qgallouedec
closed
21 hours ago
4
[`data_loader`] Optionally also propagate set_epoch to batch sampler
#3246
tomaarsen
closed
1 day ago
3
RuntimeError: The server socket has failed to listen on any local network address.
#3245
liujf69
closed
3 days ago
1
Fix : get_balanced_memory when using multi gpus with small models or quantized models with a large vocabulary
#3244
MekkCyber
opened
4 days ago
1
🚀 Feature Request: Improve `stateful_dataloader` by passing `snapshot_every_n_steps`
#3243
yzhangcs
opened
4 days ago
0
Wrong epoch when resuming from checkpoint
#3242
xiechun-tsukuba
opened
4 days ago
0
deepspeed inference
#3241
Reginald-L
opened
5 days ago
0
Communication problems with deepspeed zero3
#3240
Reginald-L
closed
4 days ago
0
OOM error when training llama 7B model using Accelerate FSDP setting
#3239
JlPang863
opened
1 week ago
1
deepspeed zero3 save model
#3238
Reginald-L
closed
6 days ago
2
slurmstepd: error: execve(): accelerate: No such file or directory
#3237
huiyang865
closed
4 days ago
3
enable `find_executable_batch_size` on XPU
#3236
faaany
closed
2 days ago
2
[docs] update code in tracking documentation
#3235
faaany
closed
1 day ago
1
[docs] add XPU to profiler documentation and fix minor bugs
#3234
faaany
closed
1 day ago
2
Code Logical Bug: Using Init Handler Kwargs for Grad Scaler In FP8 Training (accelerate/accelerator.py)
#3233
immortalCO
opened
1 week ago
0
fsdp checkpoint saving leads to NCCL WARN Cuda failure 2 'out of memory'
#3232
edchengg
opened
1 week ago
0
[RFC] Support FSDP2
#3231
kmehant
opened
1 week ago
1
Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer
#3230
Isdriai
opened
1 week ago
4
Fix slurm multinode example
#3229
ffrancesco94
opened
1 week ago
0
[docs] update set-seed
#3228
faaany
opened
2 weeks ago
3
[docs] add instruction to install bnb on non-cuda devices
#3227
faaany
closed
1 day ago
1
take care of case when "_tied_weights_keys" is not an attribute
#3226
fabianlim
closed
1 day ago
2
torch.cuda.is_available() false when running multi-gpu inference with accelerate launch
#3225
paulgekeler
closed
3 days ago
1
"mat2 must be a matrix" error when finetuning Dreambooth flux with FSDP
#3224
weixiong-ur
opened
2 weeks ago
2
remove hook for bnb 4-bit
#3223
SunMarc
closed
6 days ago
3
Add case-insensitive parsing of bool environment variables
#3222
wizeng23
opened
2 weeks ago
0
[docs] fix typo
#3221
faaany
opened
2 weeks ago
2
[docs] use real path for `checkpoint`
#3220
faaany
opened
2 weeks ago
2
Ensure explicit output `dtype` for `pad_across_processes`
#3219
mariusarvinte
opened
2 weeks ago
0
Incorrect type in output of `utils.pad_across_processes` when input is `torch.bool`
#3218
mariusarvinte
opened
2 weeks ago
1
Fix `align_module_device`, ensure only cpu tensors for `get_state_dict_offloaded_model`
#3217
kylesayrs
closed
2 weeks ago
1
PyPI published Accelerate==1.1.0 is missing Source Distributions
#3216
helloworld1
opened
2 weeks ago
3
Milad
#3215
Milad335t
closed
2 weeks ago
0
ConnectionError: Tried to launch distributed communication on port `29401`, but another process is utilizing it. Please specify a different port (such as using the `--main_process_port` flag or specifying a different `main_process_port` in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to `0`.
#3214
qinchangchang
opened
2 weeks ago
0
create _preprare_fsdp to pre- prepare fsdp model training
#3213
eljandoubi
opened
2 weeks ago
2
Timeout at validation step
#3212
qmin2
closed
2 weeks ago
1
fix load_state_dict for npu
#3211
statelesshz
opened
2 weeks ago
1
How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?
#3210
liming-ai
opened
2 weeks ago
0
The optimizer is not receiving the FSDP model parameters.
#3209
eljandoubi
opened
3 weeks ago
6
Multiple node inference
#3208
DLCM-wrz
opened
3 weeks ago
0
typo fix in big_modeling.py
#3207
a-r-r-o-w
closed
3 weeks ago
1
Multinode, multigpu example fails
#3206
ffrancesco94
opened
3 weeks ago
9
Type of Accelerator.distributed_type() might be wrong
#3205
ffrancesco94
closed
3 weeks ago
4
[Utils] `align_module_device`
#3204
kylesayrs
closed
3 weeks ago
2
Command line arguments related to deepspeed for `accelerate launch` do not override those of `default_config.yaml`
#3203
JdbermeoUZH
opened
3 weeks ago
0
Problem with metrics calculation and dataloader
#3202
gssriram
opened
3 weeks ago
0
What should I pass to fsdp_config.fsdp_transformer_layer_cls_to_wrap argument in the yaml file?
#3201
ShengYun-Peng
closed
2 weeks ago
3
Next