Issue loading tiiuae/falcon-40b with device_map="auto"

ArnaudParan commented 1 year ago

System Info

Python 3.8.10
accelerate==0.20.1
transformers==4.29.2
numpy==1.19.5
torch==2.0.1

nvidia-smi output
Wed Jun 14 12:19:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:88:00.0 Off |                    0 |
| N/A   31C    P0    73W / 400W |  60360MiB / 81251MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"

full pip freeze

accelerate==0.20.1
alabaster==0.7.13
astroid==2.15.5
asttokens==2.2.1
attrs==23.1.0
Babel==2.12.1
backcall==0.2.0
bandit==1.7.5
bitsandbytes==0.39.0
black==22.12.0
CacheControl==0.12.14
cachy==0.3.0
certifi==2023.5.7
cffi==1.15.1
cfgv==3.3.1
charset-normalizer==3.1.0
cleo==0.8.1
click==8.1.3
clikit==0.6.2
cmake==3.26.3
comm==0.1.3
coverage==7.2.5
crashtest==0.3.1
cryptography==41.0.1
datalabca-logging==1.0.3
debugpy==1.6.7
decorator==5.1.1
dill==0.3.6
distlib==0.3.6
docutils==0.17.1
einops==0.6.1
evaluation==0.0.2
executing==1.2.0
filelock==3.12.0
fsspec==2023.5.0
gitdb==4.0.10
GitPython==3.1.31
glog==0.3.1
html5lib==1.1
huggingface-hub==0.14.1
-e git+https://scm.saas.cagip.group.gca/datalabca/semantic_ia/trello/ia-gen-text-opensource.git@e114de9062f9f74636a4e4d355daed6e4300c11a#egg=ia_gen_text_opensource
identify==2.5.24
idna==3.4
imagesize==1.4.1
importlib-metadata==6.6.0
importlib-resources==5.12.0
ipdb==0.13.13
ipykernel==6.22.0
ipython==8.12.2
ipywidgets==8.0.6
isort==5.12.0
jaraco.classes==3.2.3
jedi==0.18.2
jeepney==0.8.0
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.0
jupyterlab-widgets==3.0.7
keyring==23.13.1
lazy-object-proxy==1.9.0
lit==16.0.5
lockfile==0.12.2
m2r==0.2.1
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib-inline==0.1.6
mccabe==0.7.0
mdurl==0.1.2
mistune==0.8.4
more-itertools==9.1.0
mpmath==1.3.0
msgpack==1.0.5
mypy-extensions==1.0.0
nest-asyncio==1.5.6
networkx==3.1
nodeenv==1.8.0
numpy==1.19.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==20.9
parso==0.8.3
pastel==0.2.1
pathspec==0.11.1
pbr==5.11.1
pexpect==4.8.0
pickleshare==0.7.5
pip-licenses==2.3.0
pkginfo==1.9.6
platformdirs==3.5.1
pluggy==0.13.1
poetry==1.1.15
poetry-core==1.0.8
pre-commit==2.21.0
prompt-toolkit==3.0.38
protobuf==3.20.0
psutil==5.9.5
PTable==0.9.2
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pycparser==2.21
pydantic==1.10.7
pydocstyle==6.3.0
Pygments==2.15.1
pylev==1.4.0
pylint==2.17.4
pyparsing==3.0.9
pyre-extensions==0.0.29
pytest==5.4.3
pytest-cov==3.0.0
pytest-html==2.1.1
pytest-metadata==2.0.4
pytest-mock==3.10.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-gflags==3.1.2
pytz==2023.3
PyYAML==6.0
pyzmq==25.0.2
regex==2023.5.5
requests==2.31.0
requests-toolbelt==0.9.1
rich==13.3.5
safetensors==0.3.1
scipy==1.10.1
SecretStorage==3.3.3
shellingham==1.5.0.post1
six==1.16.0
smmap==5.0.0
snowballstemmer==2.2.0
Sphinx==4.5.0
sphinx-rtd-theme==1.2.1
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stack-data==0.6.2
stevedore==5.1.0
sympy==1.12
tokenizers==0.13.3
tomli==2.0.1
tomlkit==0.11.8
torch==2.0.1
tornado==6.3.1
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
triton==2.0.0
typing-inspect==0.9.0
typing_extensions==4.6.0
urllib3==1.26.16
virtualenv==20.23.0
wcwidth==0.2.6
webencodings==0.5.1
widgetsnbextension==4.0.7
wrapt==1.15.0
xformers==0.0.20
zipp==3.15.0

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[ ] My own task or dataset (give details below)

Reproduction

The code which produces the error (the model I am trying to load is tiiuae/falcon-40b)


from transformers import AutoModelForCausalLM

if __name__ == "__main__":

    PATH = "/data/volume/huggingface/hub/models--tiiuae--falcon-40b/snapshots/b0462812b2f53caab9ccc64051635a74662fc73b/"
    model = AutoModelForCausalLM.from_pretrained(
            PATH,
            trust_remote_code=True,
            device_map="auto",
            )

I get the following error

Traceback

```python ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/volume/falcon.py:23 in │ │ │ │ 20 │ PATH = "/data/volume/huggingface/hub/models--tiiuae--falcon-40b/snapshots/b0462812b2 │ │ 21 │ tokenizer = AutoTokenizer.from_pretrained(PATH) │ │ 22 │ #model = AutoModelForCausalLM.from_pretrained(PATH, trust_remote_code=True, load_in_ │ │ ❱ 23 │ model = AutoModelForCausalLM.from_pretrained( │ │ 24 │ │ │ PATH, │ │ 25 │ │ │ trust_remote_code=True, │ │ 26 │ │ │ #load_in_8bit=True, │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:462 in │ │ from_pretrained │ │ │ │ 459 │ │ │ model_class = get_class_from_dynamic_module( │ │ 460 │ │ │ │ class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs │ │ 461 │ │ │ ) │ │ ❱ 462 │ │ │ return model_class.from_pretrained( │ │ 463 │ │ │ │ pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, │ │ 464 │ │ │ ) │ │ 465 │ │ elif type(config) in cls._model_mapping.keys(): │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/transformers/modeling_utils.py:2828 in │ │ from_pretrained │ │ │ │ 2825 │ │ │ │ 2826 │ │ # Dispatch model with hooks on all devices if necessary │ │ 2827 │ │ if device_map is not None: │ │ ❱ 2828 │ │ │ dispatch_model(model, device_map=device_map, offload_dir=offload_folder, off │ │ 2829 │ │ │ │ 2830 │ │ if output_loading_info: │ │ 2831 │ │ │ if loading_info is None: │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/big_modeling.py:373 in dispatch_model │ │ │ │ 370 │ │ weights_map = None │ │ 371 │ │ │ 372 │ tied_params = find_tied_parameters(model) │ │ ❱ 373 │ attach_align_device_hook_on_blocks( │ │ 374 │ │ model, │ │ 375 │ │ execution_device=execution_device, │ │ 376 │ │ offload=offload, │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:527 in │ │ attach_align_device_hook_on_blocks │ │ │ │ 524 │ │ │ 525 │ for child_name, child in module.named_children(): │ │ 526 │ │ child_name = f"{module_name}.{child_name}" if len(module_name) > 0 else child_na │ │ ❱ 527 │ │ attach_align_device_hook_on_blocks( │ │ 528 │ │ │ child, │ │ 529 │ │ │ execution_device=execution_device, │ │ 530 │ │ │ offload=offload, │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:527 in │ │ attach_align_device_hook_on_blocks │ │ │ │ 524 │ │ │ 525 │ for child_name, child in module.named_children(): │ │ 526 │ │ child_name = f"{module_name}.{child_name}" if len(module_name) > 0 else child_na │ │ ❱ 527 │ │ attach_align_device_hook_on_blocks( │ │ 528 │ │ │ child, │ │ 529 │ │ │ execution_device=execution_device, │ │ 530 │ │ │ offload=offload, │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:497 in │ │ attach_align_device_hook_on_blocks │ │ │ │ 494 │ │ │ place_submodules=True, │ │ 495 │ │ │ skip_keys=skip_keys, │ │ 496 │ │ ) │ │ ❱ 497 │ │ add_hook_to_module(module, hook) │ │ 498 │ │ attach_execution_device_hook(module, execution_device[module_name]) │ │ 499 │ elif module_name in execution_device and module_name in offload: │ │ 500 │ │ attach_align_device_hook( │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:155 in add_hook_to_module │ │ │ │ 152 │ │ old_forward = module.forward │ │ 153 │ │ module._old_forward = old_forward │ │ 154 │ │ │ ❱ 155 │ module = hook.init_hook(module) │ │ 156 │ module._hf_hook = hook │ │ 157 │ │ │ 158 │ @functools.wraps(old_forward) │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:253 in init_hook │ │ │ │ 250 │ def init_hook(self, module): │ │ 251 │ │ if not self.offload and self.execution_device is not None: │ │ 252 │ │ │ for name, _ in named_module_tensors(module, recurse=self.place_submodules): │ │ ❱ 253 │ │ │ │ set_module_tensor_to_device(module, name, self.execution_device) │ │ 254 │ │ elif self.offload: │ │ 255 │ │ │ self.original_devices = { │ │ 256 │ │ │ │ name: param.device for name, param in named_module_tensors(module, recur │ │ │ │ /data/volume/venv/lib/python3.8/site-packages/accelerate/utils/modeling.py:154 in │ │ set_module_tensor_to_device │ │ │ │ 151 │ old_value = getattr(module, tensor_name) │ │ 152 │ │ │ 153 │ if old_value.device == torch.device("meta") and device not in ["meta", torch.device( │ │ ❱ 154 │ │ raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put │ │ 155 │ │ │ 156 │ if value is not None: │ │ 157 │ │ if dtype is None: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: weight is on the meta device, we need a `value` to put in on 0. ```

If i try to load on cpu without using device_map="auto" it works and inference works too but takes a very long time.

If I try to debug and print where every named weight is at that point I get the following

https://gist.github.com/ArnaudParan/075a3a81a32e9cc2485884aa19f52232#file-weights_with_device_map_auto-json

If I myself try to set the device_map with the previous positions just replacing everything with cpu or cuda, at inference I get an error regardless

Traceback

```python ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ | in :1 | | | | /data/volume/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py:115 in decorate_context | | | | 112 | @functools.wraps(func) | | 113 | def decorate_context(*args, **kwargs): | | 114 | | with ctx_factory(): | | ❱ 115 | | | return func(*args, **kwargs) | | 116 | | | 117 | return decorate_context | | 118 | | | | /data/volume/venv/lib/python3.8/site-packages/transformers/generation/utils.py:1515 in generate | | | | 1512 | | | | ) | | 1513 | | | | | 1514 | | | # 11. run greedy search | | ❱ 1515 | | | return self.greedy_search( | | 1516 | | | | input_ids, | | 1517 | | | | logits_processor=logits_processor, | | 1518 | | | | stopping_criteria=stopping_criteria, | | | | /data/volume/venv/lib/python3.8/site-packages/transformers/generation/utils.py:2332 in | | greedy_search | | | | 2329 | | | model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) | | 2330 | | | | | 2331 | | | # forward pass to get next token | | ❱ 2332 | | | outputs = self( | | 2333 | | | | **model_inputs, | | 2334 | | | | return_dict=True, | | 2335 | | | | output_attentions=output_attentions, | | | | /data/volume/venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl | | | | 1498 | | if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks | | 1499 | | | | or _global_backward_pre_hooks or _global_backward_hooks | | 1500 | | | | or _global_forward_hooks or _global_forward_pre_hooks): | | ❱ 1501 | | | return forward_call(*args, **kwargs) | | 1502 | | # Do not call functions when jit is used | | 1503 | | full_backward_hooks, non_full_backward_hooks = [], [] | | 1504 | | backward_pre_hooks = [] | | | | /data/volume/venv/lib/python3.8/site-packages/accelerate/hooks.py:165 in new_forward | | | | 162 | | | with torch.no_grad(): | | 163 | | | | output = old_forward(*args, **kwargs) | | 164 | | else: | | ❱ 165 | | | output = old_forward(*args, **kwargs) | | 166 | | return module._hf_hook.post_forward(module, output) | | 167 | | | 168 | module.forward = new_forward | | | | /home/runai-home/.cache/huggingface/modules/transformers_modules/modelling_RW.py:759 in forward | | | | 756 | | | | 757 | | return_dict = return_dict if return_dict is not None else self.config.use_return | | 758 | | | | ❱ 759 | | transformer_outputs = self.transformer( | | 760 | | | input_ids, | | 761 | | | past_key_values=past_key_values, | | 762 | | | attention_mask=attention_mask, | | | | /data/volume/venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl | | | | 1498 | | if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks | | 1499 | | | | or _global_backward_pre_hooks or _global_backward_hooks | | 1500 | | | | or _global_forward_hooks or _global_forward_pre_hooks): | | ❱ 1501 | | | return forward_call(*args, **kwargs) | | 1502 | | # Do not call functions when jit is used | | 1503 | | full_backward_hooks, non_full_backward_hooks = [], [] | | 1504 | | backward_pre_hooks = [] | | | | /home/runai-home/.cache/huggingface/modules/transformers_modules/modelling_RW.py:654 in forward | | | | 651 | | | | | head_mask[i], | | 652 | | | | ) | | 653 | | | else: | | ❱ 654 | | | | outputs = block( | | 655 | | | | | hidden_states, | | 656 | | | | | layer_past=layer_past, | | 657 | | | | | attention_mask=causal_mask, | | | | /data/volume/venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl | | | | 1498 | | if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks | | 1499 | | | | or _global_backward_pre_hooks or _global_backward_hooks | | 1500 | | | | or _global_forward_hooks or _global_forward_pre_hooks): | | ❱ 1501 | | | return forward_call(*args, **kwargs) | | 1502 | | # Do not call functions when jit is used | | 1503 | | full_backward_hooks, non_full_backward_hooks = [], [] | | 1504 | | backward_pre_hooks = [] | | | | /home/runai-home/.cache/huggingface/modules/transformers_modules/modelling_RW.py:396 in forward | | | | 393 | | residual = hidden_states | | 394 | | | | 395 | | # Self attention. | | ❱ 396 | | attn_outputs = self.self_attention( | | 397 | | | ln_attn, | | 398 | | | layer_past=layer_past, | | 399 | | | attention_mask=attention_mask, | | | | /data/volume/venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl | | | | 1498 | | if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks | | 1499 | | | | or _global_backward_pre_hooks or _global_backward_hooks | | 1500 | | | | or _global_forward_hooks or _global_forward_pre_hooks): | | ❱ 1501 | | | return forward_call(*args, **kwargs) | | 1502 | | # Do not call functions when jit is used | | 1503 | | full_backward_hooks, non_full_backward_hooks = [], [] | | 1504 | | backward_pre_hooks = [] | | | | /home/runai-home/.cache/huggingface/modules/transformers_modules/modelling_RW.py:252 in forward | | | | 249 | | use_cache: bool = False, | | 250 | | output_attentions: bool = False, | | 251 | ): | | ❱ 252 | | fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x | | 253 | | | | 254 | | # 3 x [batch_size, seq_length, num_heads, head_dim] | | 255 | | (query_layer, key_layer, value_layer) = self._split_heads(fused_qkv) | | | | /data/volume/venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1501 in _call_impl | | | | 1498 | | if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks | | 1499 | | | | or _global_backward_pre_hooks or _global_backward_hooks | | 1500 | | | | or _global_forward_hooks or _global_forward_pre_hooks): | | ❱ 1501 | | | return forward_call(*args, **kwargs) | | 1502 | | # Do not call functions when jit is used | | 1503 | | full_backward_hooks, non_full_backward_hooks = [], [] | | 1504 | | backward_pre_hooks = [] | | | | /data/volume/venv/lib/python3.8/site-packages/bitsandbytes/nn/modules.py:388 in forward | | | | 385 | | if self.bias is not None and self.bias.dtype != x.dtype: | | 386 | | | self.bias.data = self.bias.data.to(x.dtype) | | 387 | | | | ❱ 388 | | out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) | | 389 | | | | 390 | | if not self.state.has_fp16_weights: | | 391 | | | if self.state.CB is not None and self.state.CxB is not None: | | | | /data/volume/venv/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:559 in matmul | | | | 556 | state = state or MatmulLtState() | | 557 | if threshold > 0.0: | | 558 | | state.threshold = threshold | | ❱ 559 | return MatMul8bitLt.apply(A, B, out, bias, state) | | 560 | | 561 | | 562 def matmul_4bit(A: tensor, B: tensor, quant_state: List, out: tensor = None, bias=None): | | | | /data/volume/venv/lib/python3.8/site-packages/torch/autograd/function.py:506 in apply | | | | 503 | | if not torch._C._are_functorch_transforms_active(): | | 504 | | | # See NOTE: [functorch vjp and autograd interaction] | | 505 | | | args = _functorch.utils.unwrap_dead_wrappers(args) | | ❱ 506 | | | return super().apply(*args, **kwargs) # type: ignore[misc] | | 507 | | | | 508 | | if cls.setup_context == _SingleLevelFunction.setup_context: | | 509 | | | raise RuntimeError( | | | | /data/volume/venv/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:293 in forward | | | | 290 | | | 291 | @staticmethod | | 292 | def forward(ctx, A, B, out=None, bias=None, state=MatmulLtState): | | ❱ 293 | | using_igemmlt = supports_igemmlt(A.device) and not state.force_no_igemmlt | | 294 | | # default of pytorch behavior if inputs are empty | | 295 | | ctx.is_empty = False | | 296 | | if prod(A.shape) == 0: | | | | /data/volume/venv/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py:226 in | | supports_igemmlt | | | | 223 | | 224 def supports_igemmlt(device: torch.device) -> bool: | | 225 | """check if this device supports the optimized int8 kernel""" | | ❱ 226 | if torch.cuda.get_device_capability(device=device) < (7, 5): | | 227 | | return False | | 228 | device_name = torch.cuda.get_device_name(device=device) | | 229 | nvidia16_models = ('GTX 1630', 'GTX 1650', 'GTX 1660') # https://en.wikipedia.org/w | | | | /data/volume/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:381 in | | get_device_capability | | | | 378 | Returns: | | 379 | | tuple(int, int): the major and minor cuda capability of the device | | 380 | """ | | ❱ 381 | prop = get_device_properties(device) | | 382 | return prop.major, prop.minor | | 383 | | 384 | | | | /data/volume/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:396 in | | get_device_properties | | | | 393 | | _CudaDeviceProperties: the properties of the device | | 394 | """ | | 395 | _lazy_init() # will define _get_device_properties | | ❱ 396 | device = _get_device_index(device, optional=True) | | 397 | if device < 0 or device >= device_count(): | | 398 | | raise AssertionError("Invalid device id") | | 399 | return _get_device_properties(device) # type: ignore[name-defined] | | | | /data/volume/venv/lib/python3.8/site-packages/torch/cuda/_utils.py:32 in _get_device_index | | | | 29 | | | if device.type not in ['cuda', 'cpu']: | | 30 | | | | raise ValueError('Expected a cuda or cpu device, but got: {}'.format(dev | | 31 | | elif device.type != 'cuda': | | ❱ 32 | | | raise ValueError('Expected a cuda device, but got: {}'.format(device)) | | 33 | if not torch.jit.is_scripting(): | | 34 | | if isinstance(device, torch.cuda.device): | | 35 | | | return device.idx | ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: Expected a cuda device, but got: meta ```

Weights https://gist.github.com/ArnaudParan/075a3a81a32e9cc2485884aa19f52232#file-weights_with_custom_cpu_device_map-json

And even though I tried to force the device map, some weights are on the device meta

I honestly don't know what more to try or do, I want to load the model using quantization so that it fits on ones A100 nvidia gpu but those intractable and strange errors make it difficult for me to do so. I also tried to directly take the versions of the transformers and accelerate libraries which are on github's master branch but still got the same issues

Thank you very much for your kind help.

Expected behavior

Expected behavior would be being able to load the model without all that hassle

sgugger commented 1 year ago

Could you print the device_map you get for instance by debugging and printing it from this line in the traceback?

❱ 2828 │   │   │   dispatch_model(model, device_map=device_map, offload_dir=offload_folder, off

On my side I can load the model on a mix of GPU/CPU/disk with by adding offload_folder in the call to from_pretrained, so can't reproduce your bug.

ArnaudParan commented 1 year ago

Of course, the device_map I get is {'': 'cpu'}

I don't want to offload to disk as that will make things way too slow but I'm testing whether it works. Also I don't have access to much disk space and might kill my cluster If I use disk offloading :/ I will try on s3 though

sgugger commented 1 year ago

Oh but this means you don't have any GPU available (sorry I didn't read your post well engouh), in which case you can load fast on CPU with low_cpu_mem_usage=True (instead of device_map="auto")

ArnaudParan commented 1 year ago

No that's fine, I do have access to a gpu and if I provide with a custom device_map it successfully puts everything on GPU. My GPU is an A100 nvidia GPU with 80 gigs of ram

ArnaudParan commented 1 year ago

Also I tried offloading to a folder in s3 but get the same error

vrunm commented 1 year ago

First disconnect the run time. And then install the following libraries:

!pip install -q -U transformers datasets !pip install -q -U accelerate

Base on similar issues in accelerate, you might need to upgrade your version of the library accelerate: https://github.com/huggingface/peft/issues/186

ArnaudParan commented 1 year ago

Same error after stopping ipython, running

pip install -q -U transformers datasets pip install -q -U accelerate

And restarting another ipython

The code is still


from transformers import AutoModelForCausalLM

if __name__ == "__main__":

    PATH = "/data/volume/huggingface/hub/models--tiiuae--falcon-40b/snapshots/b0462812b2f53caab9ccc64051635a74662fc73b/"
    model = AutoModelForCausalLM.from_pretrained(
            PATH,
            trust_remote_code=True,
            device_map="auto",
            offload_folder="/data/s3-models/offload",
            )


accelerate==0.20.3
aiohttp==3.8.4
aiosignal==1.3.1
alabaster==0.7.13
astroid==2.15.5
asttokens==2.2.1
async-timeout==4.0.2
attrs==23.1.0
Babel==2.12.1
backcall==0.2.0
bandit==1.7.5
bitsandbytes==0.39.0
black==22.12.0
CacheControl==0.12.14
cachy==0.3.0
certifi==2023.5.7
cffi==1.15.1
cfgv==3.3.1
charset-normalizer==3.1.0
cleo==0.8.1
click==8.1.3
clikit==0.6.2
cmake==3.26.3
comm==0.1.3
coverage==7.2.5
crashtest==0.3.1
cryptography==41.0.1
datalabca-logging==1.0.3
datasets==2.13.0
debugpy==1.6.7
decorator==5.1.1
dill==0.3.6
distlib==0.3.6
docutils==0.17.1
einops==0.6.1
evaluation==0.0.2
executing==1.2.0
filelock==3.12.0
frozenlist==1.3.3
fsspec==2023.5.0
gitdb==4.0.10
GitPython==3.1.31
glog==0.3.1
html5lib==1.1
huggingface-hub==0.14.1
-e git+https://scm.saas.cagip.group.gca/datalabca/semantic_ia/trello/ia-gen-text-opensource.git@e114de9062f9f74636a4e4d355daed6e4300c11a#egg=ia_gen_text_opensource
identify==2.5.24
idna==3.4
imagesize==1.4.1
importlib-metadata==6.6.0
importlib-resources==5.12.0
ipdb==0.13.13
ipykernel==6.22.0
ipython==8.12.2
ipywidgets==8.0.6
isort==5.12.0
jaraco.classes==3.2.3
jedi==0.18.2
jeepney==0.8.0
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.0
jupyterlab-widgets==3.0.7
keyring==23.13.1
lazy-object-proxy==1.9.0
lit==16.0.5
lockfile==0.12.2
m2r==0.2.1
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib-inline==0.1.6
mccabe==0.7.0
mdurl==0.1.2
mistune==0.8.4
more-itertools==9.1.0
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==1.0.0
nest-asyncio==1.5.6
networkx==3.1
nodeenv==1.8.0
numpy==1.24.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==20.9
pandas==2.0.2
parso==0.8.3
pastel==0.2.1
pathspec==0.11.1
pbr==5.11.1
pexpect==4.8.0
pickleshare==0.7.5
pip-licenses==2.3.0
pkginfo==1.9.6
platformdirs==3.5.1
pluggy==0.13.1
poetry==1.1.15
poetry-core==1.0.8
pre-commit==2.21.0
prompt-toolkit==3.0.38
protobuf==3.20.0
psutil==5.9.5
PTable==0.9.2
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pyarrow==12.0.1
pycparser==2.21
pydantic==1.10.7
pydocstyle==6.3.0
Pygments==2.15.1
pylev==1.4.0
pylint==2.17.4
pyparsing==3.0.9
pyre-extensions==0.0.29
pytest==5.4.3
pytest-cov==3.0.0
pytest-html==2.1.1
pytest-metadata==2.0.4
pytest-mock==3.10.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-gflags==3.1.2
pytz==2023.3
PyYAML==6.0
pyzmq==25.0.2
regex==2023.5.5
requests==2.31.0
requests-toolbelt==0.9.1
rich==13.3.5
safetensors==0.3.1
scipy==1.10.1
SecretStorage==3.3.3
shellingham==1.5.0.post1
six==1.16.0
smmap==5.0.0
snowballstemmer==2.2.0
Sphinx==4.5.0
sphinx-rtd-theme==1.2.1
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stack-data==0.6.2
stevedore==5.1.0
sympy==1.12
tokenizers==0.13.3
tomli==2.0.1
tomlkit==0.11.8
torch==2.0.1
tornado==6.3.1
tqdm==4.65.0
traitlets==5.9.0
transformers==4.30.2
triton==2.0.0
typing-inspect==0.9.0
typing_extensions==4.6.0
tzdata==2023.3
urllib3==1.26.16
virtualenv==20.23.0
wcwidth==0.2.6
webencodings==0.5.1
widgetsnbextension==4.0.7
wrapt==1.15.0
xformers==0.0.20
xxhash==3.2.0
yarl==1.9.2
zipp==3.15.0

installed from accelerate commit f1e84decc9d1e4f63aa443f8124b4876c79fff81 and transformers commit ba695c1efd55091e394eb59c90fb33ac3f9f0d41 and I still get the same error

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / accelerate