Closed ArnaudParan closed 1 year ago
Could you print the device_map
you get for instance by debugging and printing it from this line in the traceback?
❱ 2828 │ │ │ dispatch_model(model, device_map=device_map, offload_dir=offload_folder, off
On my side I can load the model on a mix of GPU/CPU/disk with by adding offload_folder
in the call to from_pretrained
, so can't reproduce your bug.
Of course, the device_map I get is {'': 'cpu'}
I don't want to offload to disk as that will make things way too slow but I'm testing whether it works. Also I don't have access to much disk space and might kill my cluster If I use disk offloading :/ I will try on s3 though
Oh but this means you don't have any GPU available (sorry I didn't read your post well engouh), in which case you can load fast on CPU with low_cpu_mem_usage=True
(instead of device_map="auto"
)
No that's fine, I do have access to a gpu and if I provide with a custom device_map it successfully puts everything on GPU. My GPU is an A100 nvidia GPU with 80 gigs of ram
Also I tried offloading to a folder in s3 but get the same error
First disconnect the run time. And then install the following libraries:
!pip install -q -U transformers datasets !pip install -q -U accelerate
Base on similar issues in accelerate, you might need to upgrade your version of the library accelerate: https://github.com/huggingface/peft/issues/186
Same error after stopping ipython, running
pip install -q -U transformers datasets pip install -q -U accelerate
And restarting another ipython
The code is still
from transformers import AutoModelForCausalLM
if __name__ == "__main__":
PATH = "/data/volume/huggingface/hub/models--tiiuae--falcon-40b/snapshots/b0462812b2f53caab9ccc64051635a74662fc73b/"
model = AutoModelForCausalLM.from_pretrained(
PATH,
trust_remote_code=True,
device_map="auto",
offload_folder="/data/s3-models/offload",
)
accelerate==0.20.3
aiohttp==3.8.4
aiosignal==1.3.1
alabaster==0.7.13
astroid==2.15.5
asttokens==2.2.1
async-timeout==4.0.2
attrs==23.1.0
Babel==2.12.1
backcall==0.2.0
bandit==1.7.5
bitsandbytes==0.39.0
black==22.12.0
CacheControl==0.12.14
cachy==0.3.0
certifi==2023.5.7
cffi==1.15.1
cfgv==3.3.1
charset-normalizer==3.1.0
cleo==0.8.1
click==8.1.3
clikit==0.6.2
cmake==3.26.3
comm==0.1.3
coverage==7.2.5
crashtest==0.3.1
cryptography==41.0.1
datalabca-logging==1.0.3
datasets==2.13.0
debugpy==1.6.7
decorator==5.1.1
dill==0.3.6
distlib==0.3.6
docutils==0.17.1
einops==0.6.1
evaluation==0.0.2
executing==1.2.0
filelock==3.12.0
frozenlist==1.3.3
fsspec==2023.5.0
gitdb==4.0.10
GitPython==3.1.31
glog==0.3.1
html5lib==1.1
huggingface-hub==0.14.1
-e git+https://scm.saas.cagip.group.gca/datalabca/semantic_ia/trello/ia-gen-text-opensource.git@e114de9062f9f74636a4e4d355daed6e4300c11a#egg=ia_gen_text_opensource
identify==2.5.24
idna==3.4
imagesize==1.4.1
importlib-metadata==6.6.0
importlib-resources==5.12.0
ipdb==0.13.13
ipykernel==6.22.0
ipython==8.12.2
ipywidgets==8.0.6
isort==5.12.0
jaraco.classes==3.2.3
jedi==0.18.2
jeepney==0.8.0
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.0
jupyterlab-widgets==3.0.7
keyring==23.13.1
lazy-object-proxy==1.9.0
lit==16.0.5
lockfile==0.12.2
m2r==0.2.1
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib-inline==0.1.6
mccabe==0.7.0
mdurl==0.1.2
mistune==0.8.4
more-itertools==9.1.0
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
nest-asyncio==1.5.6
networkx==3.1
nodeenv==1.8.0
numpy==1.24.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==20.9
pandas==2.0.2
parso==0.8.3
pastel==0.2.1
pathspec==0.11.1
pbr==5.11.1
pexpect==4.8.0
pickleshare==0.7.5
pip-licenses==2.3.0
pkginfo==1.9.6
platformdirs==3.5.1
pluggy==0.13.1
poetry==1.1.15
poetry-core==1.0.8
pre-commit==2.21.0
prompt-toolkit==3.0.38
protobuf==3.20.0
psutil==5.9.5
PTable==0.9.2
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pyarrow==12.0.1
pycparser==2.21
pydantic==1.10.7
pydocstyle==6.3.0
Pygments==2.15.1
pylev==1.4.0
pylint==2.17.4
pyparsing==3.0.9
pyre-extensions==0.0.29
pytest==5.4.3
pytest-cov==3.0.0
pytest-html==2.1.1
pytest-metadata==2.0.4
pytest-mock==3.10.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-gflags==3.1.2
pytz==2023.3
PyYAML==6.0
pyzmq==25.0.2
regex==2023.5.5
requests==2.31.0
requests-toolbelt==0.9.1
rich==13.3.5
safetensors==0.3.1
scipy==1.10.1
SecretStorage==3.3.3
shellingham==1.5.0.post1
six==1.16.0
smmap==5.0.0
snowballstemmer==2.2.0
Sphinx==4.5.0
sphinx-rtd-theme==1.2.1
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stack-data==0.6.2
stevedore==5.1.0
sympy==1.12
tokenizers==0.13.3
tomli==2.0.1
tomlkit==0.11.8
torch==2.0.1
tornado==6.3.1
tqdm==4.65.0
traitlets==5.9.0
transformers==4.30.2
triton==2.0.0
typing-inspect==0.9.0
typing_extensions==4.6.0
tzdata==2023.3
urllib3==1.26.16
virtualenv==20.23.0
wcwidth==0.2.6
webencodings==0.5.1
widgetsnbextension==4.0.7
wrapt==1.15.0
xformers==0.0.20
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0
installed from accelerate commit f1e84decc9d1e4f63aa443f8124b4876c79fff81 and transformers commit ba695c1efd55091e394eb59c90fb33ac3f9f0d41 and I still get the same error
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
The code which produces the error (the model I am trying to load is tiiuae/falcon-40b)
I get the following error
Traceback
```python ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/volume/falcon.py:23 inIf i try to load on cpu without using device_map="auto" it works and inference works too but takes a very long time.
If I try to debug and print where every named weight is at that point I get the following
https://gist.github.com/ArnaudParan/075a3a81a32e9cc2485884aa19f52232#file-weights_with_device_map_auto-json
If I myself try to set the device_map with the previous positions just replacing everything with cpu or cuda, at inference I get an error regardless
Traceback
```python ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ | inWeights https://gist.github.com/ArnaudParan/075a3a81a32e9cc2485884aa19f52232#file-weights_with_custom_cpu_device_map-json
And even though I tried to force the device map, some weights are on the device meta
I honestly don't know what more to try or do, I want to load the model using quantization so that it fits on ones A100 nvidia gpu but those intractable and strange errors make it difficult for me to do so. I also tried to directly take the versions of the transformers and accelerate libraries which are on github's master branch but still got the same issues
Thank you very much for your kind help.
Expected behavior
Expected behavior would be being able to load the model without all that hassle