Open webbigdata-jp opened 1 month ago
I have been able to train without any problems using the following version. Thank you.
$ pip list
Package Version
------------------------ ------------
accelerate 0.34.2
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.1
certifi 2024.7.4
charset-normalizer 3.3.2
click 8.1.7
datasets 2.20.0
dill 0.3.8
docker-pycreds 0.4.0
docstring_parser 0.16
einops 0.8.0
filelock 3.15.4
flash-attn 2.6.3
frozenlist 1.4.1
fsspec 2024.5.0
gitdb 4.0.11
GitPython 3.1.43
hf_transfer 0.1.8
huggingface-hub 0.23.4
idna 3.7
Jinja2 3.1.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.3
ninja 1.11.1.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.82
nvidia-nvtx-cu12 12.1.105
packaging 24.1
pandas 2.2.2
peft 0.11.1
pillow 10.4.0
pip 22.0.2
platformdirs 4.2.2
protobuf 3.20.3
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
Pygments 2.18.0
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.3
rich 13.7.1
safetensors 0.4.3
sentencepiece 0.2.0
sentry-sdk 2.7.1
setproctitle 1.3.3
setuptools 59.6.0
shtab 1.7.1
six 1.16.0
smmap 5.0.1
sympy 1.13.0
tokenizers 0.20.0
torch 2.3.0
tqdm 4.66.4
transformers 4.45.1
triton 2.3.0
trl 0.9.6
typing_extensions 4.12.2
tyro 0.8.5
tzdata 2024.1
unsloth 2024.9.post3
urllib3 2.2.2
wandb 0.17.4
wheel 0.43.0
xformers 0.0.26.post1
xxhash 3.4.1
yarl 1.9.4
Upgrading accelerate
to 0.34.2 solved the issue.
Sorry on the delay - I added accelerate>=0.34.2
in pyproject.toml
so future installs won't have this issue - thanks for the fixes everyone!
Updating unsloth_env_file.yml with: accelerate==0.34.2
in https://github.com/unslothai/unsloth/wiki#nvidia-pascal-support
also works and allows training on nvidia pascal P40 P100
Maybe it is irrelevant, because I don't use unsloth, but I had the same issue. In my trainer.py the code
model.train()
if hasattr(self.optimizer, "train") and callable(self.optimizer.train):
self.optimizer.train()
caused the issue. I changed it to:
model.train()
if hasattr(self.optimizer, 'train'):
self.model.train()
and the training started without an error. Maybe that helps someone.
Update accelerate
is probably better, but that didn't worked in my case. (I used Oobabooga with Training PRO extension)
This is not bug report. Just report.
It's not unsloth error was not the cause, the title error occurred at the start of training and accelerate seemed to be affecting it.
https://github.com/huggingface/transformers/issues/33620