huggingface accelerate issues

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

https://huggingface.co/docs/accelerate

Apache License 2.0

7.34k stars 875 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

load_checkpoint_and_dispatch does not work on nn.Modules with hook that manipulates tensors

#2919 captainst opened 3 hours ago
0
Question about `float16` when load the model

#2918 AIR-hl opened 6 hours ago
1
Support MUSA (Moore Threads GPU) backend in accelerate

#2917 fmo-mt opened 8 hours ago
1
Allow multiple process per device

#2916 cifkao opened 20 hours ago
1
AI assist to auto update scripts to use accelerate

#2915 samyakkkk opened 1 day ago
1
Fix slowdown on init with `device_map="auto"`

#2914 muellerzr closed 1 day ago
1
Multinode Inference using deepspeed zero3

#2913 NotTheStallion opened 1 day ago
0
NCCL error during saving checkpoint with ds zero3

#2912 rubickkcibur opened 1 day ago
1
Model loading extremely slow after upgrading from 0.31.0 to 0.32.0

#2911 chenyuteng01 closed 1 day ago
3
Model Memory Calculator For Different Input Token

#2910 ZorkJ opened 1 day ago
3
[tests] fix bug in torch_device

#2909 faaany closed 1 day ago
1
'from transformers import Trainer' can hinder multi-gpu training process on Jupyter.

#2908 SHEN2BAIYI opened 2 days ago
8
Parameters out-of-sync in multi-GPU training, only 1st GPU actually contributes to training

#2907 parlance-zz closed 2 days ago
0
ValueError: weight is on the meta device, we need a `value` to put in on 0.

#2906 nilsjohanbjorck opened 2 days ago
1
False device placement when use with quantization_config

#2905 xinghaow99 opened 5 days ago
2
How to merge Qlora FSDP weights with an LLM and save model.

#2904 Minami-su closed 4 days ago
2
With the same epoch, the result of multiple Gpus is much lower than that of a single gpu，why？

#2903 xiuguangLi opened 1 week ago
1
Added a MultiCPU SLURM example using Accelerate Launch and MPIRun

#2902 okhleif-IL closed 2 days ago
1
Problem on custom device_map

#2901 wonkyoc opened 1 week ago
2
about run glm4 demo error

#2900 leizhu1989 opened 1 week ago
3
training loop freezes after first step on TPU

#2899 drimeF0 opened 1 week ago
2
Move to cpu takes extra memory usage after .gather()

#2898 xinghaow99 closed 6 days ago
7
Accelerate load_checkpoint_and_dispatch - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1

#2897 adarsh-ks closed 3 days ago
12
Feature Request: Pipeline multiple batches together for Llama3 70B distributed inference

#2896 ishan-gaur closed 1 week ago
4
Add early support for `torchdata.stateful_dataloader.StatefulDataLoader` within the `Accelerator`

#2895 byi8220 opened 1 week ago
1
Importing `torchdata.stateful_dataloader` causes the test `check_seedable_sampler` to fail

#2894 byi8220 closed 2 days ago
9
notebook_launcher on kaggle tpu

#2893 lhiqwj173 opened 1 week ago
3
Add XLA Dynamo backends for training and inference

#2892 johnsutor closed 2 days ago
1
How to set a custom Config in python code using Accelerate?

#2891 konstantinator opened 1 week ago
1
More than 10 times slowdown between version 0.26.1 and version 0.31.0, EDIT: It was a data loading issue with Hugginface Datasets

#2890 marhlder closed 1 week ago
6
Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix

#2889 yhna940 opened 1 week ago
9
Make `log_line_prefix_template` Optional in Elastic Launcher for Backward Compatibility

#2888 yhna940 closed 2 days ago
4
fix mlu device longTensor bugs

#2887 huismiling closed 2 days ago
2
Can't apply LoRA's PiSSA weight init when using DeepSpeed ZeRO3 + LoRA to finetune!

#2886 ANYMS-A closed 1 week ago
4
The saved model with deepspeed zero3 can not be correctly loaded

#2885 rubickkcibur closed 1 week ago
2
Why is there a double fetch in the first batch when using accelerate?"

#2884 qsunyuan opened 1 week ago
1
Add Profiler Support for Performance Analysis

#2883 yhna940 closed 3 days ago
3
accelerator.prepare just can be run jus once ?

#2882 DavideHe opened 2 weeks ago
2
typo in examples/slurm/submit_multinode.sh script

#2881 hubutui closed 2 weeks ago
2
Add ignore_unexpected_keys arg to load_checkpoint_in_model()

#2880 Qubitium closed 2 weeks ago
1
fix `load_state_dict` for xpu and refine xpu safetensor version check

#2879 faaany closed 2 days ago
5
add `require_triton` and enable `test_dynamo` work on xpu

#2878 faaany closed 2 days ago
4
Some adjustment for supporting Deepspeed-Ulysses

#2877 zeyugao opened 2 weeks ago
2
make more cuda-only tests device-agnostic

#2876 faaany closed 2 days ago
6
Correct loading of models with shared tensors when using accelerator.load_state()

#2875 jkuntzer opened 2 weeks ago
2
fix bug when getting the real accelerator's device number

#2874 faaany closed 2 days ago
6
Plan to support FSDP2?

#2873 ByronHsu opened 2 weeks ago
6
Accelerate test fails: Exception: Could not find the transformer layer class to wrap in the model

#2872 MikaSie opened 2 weeks ago
8
Cannot free VRAM after loading a quantized model

#2871 lstein opened 2 weeks ago
1
Support for Torch XLA Dynamo Backend

#2870 johnsutor closed 2 days ago
1