Getting ValueError: model.shared.weight doesn't have any device set in running a M2M100's-12B model on colab while using with accelerate

abhishektcs1 commented 1 year ago

System Info

I am getting following error while using accelerate for M2M100 on google colab pro. Following is the code snippet:

import torch

device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel

from accelerate import infer_auto_device_map, init_empty_weights

from transformers import AutoModel, M2M100Config

config = M2M100Config.from_pretrained("facebook/m2m100-12B-last-ckpt")

with init_empty_weights(): model = AutoModel.from_config(config)

device_map = infer_auto_device_map(model, no_split_module_classes=["M2M100Attention"])

checkpoint = "facebook/m2m100-12B-last-ckpt"

device_map["shared"] = "cpu" device_map["encoder"] = "cpu" device_map["decoder.embed_tokens"] = "cpu" device_map["decoder.embed_positions"] = "cpu" device_map["decoder.layers.0"] = "cpu" device_map["decoder.layers.1"] = "cpu" device_map["decoder.layers.2"] = "cpu" device_map["decoder.layers.3"] = "cpu"

model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder="offload", offload_state_dict = True)

Following are the env specs: Model Link: https://huggingface.co/facebook/m2m100-12B-last-ckpt Python Version: 3.10 GPU: A100 GPU: 40GB RAM: 83.5 GB CUDA version: 12.0

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

import torch

device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel

from accelerate import infer_auto_device_map, init_empty_weights

from transformers import AutoModel, M2M100Config

config = M2M100Config.from_pretrained("facebook/m2m100-12B-last-ckpt")

with init_empty_weights(): model = AutoModel.from_config(config)

device_map = infer_auto_device_map(model, no_split_module_classes=["M2M100Attention"])

checkpoint = "facebook/m2m100-12B-last-ckpt"

device_map["shared"] = "cpu" device_map["encoder"] = "cpu" device_map["decoder.embed_tokens"] = "cpu" device_map["decoder.embed_positions"] = "cpu" device_map["decoder.layers.0"] = "cpu" device_map["decoder.layers.1"] = "cpu" device_map["decoder.layers.2"] = "cpu" device_map["decoder.layers.3"] = "cpu"

model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder="offload", offload_state_dict = True)

Expected behavior

Expecting the model to load properly and after the following code is to be used for translation:

hi_text='''La vie est comme une boîte de chocolat.'''

tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100-12B-last-ckpt")

encoded_hi = tokenizer(hi_text, return_tensors="pt").to('cuda')

generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("en")) HF_error

print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])

amyeroberts commented 1 year ago

Hi @abhishektcs1, thanks for reporting this issue!

Could you provide information about the running environment: run transformers-cli env in the terminal and copy-paste the output?

sanyoggupta commented 1 year ago

Hi @abhishektcs1, thanks for reporting this issue!

Could you provide information about the running environment: run transformers-cli env in the terminal and copy-paste the output?

Hi @amyeroberts , I am also facing the same error. Please find below the output of 'transformer-cli'

2023-05-13 05:36:18.558293: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/transformers/commands/env.py:63: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use tf.config.list_physical_devices('GPU') instead. 2023-05-13 05:36:22.918424: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.29.1
Platform: Linux-5.15.107+-x86_64-with-glibc2.31
Python version: 3.10.11
Huggingface_hub version: 0.14.1
Safetensors version: not installed
PyTorch version (GPU?): 2.0.0+cu118 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.6.9 (gpu)
Jax version: 0.4.8
JaxLib version: 0.4.7
Using GPU in script?:
Using distributed or parallel set-up in script?:

Please find the attachment having nvidia-smi output of google colab pro, I am using

amyeroberts commented 1 year ago

@abhishektcs1 @sanyoggupta Could either of you also share a full traceback of the error encountered (the entire error message, from the first lines), preferably as a copy-paste of the text rather than a screenshot please?

karths8 commented 1 year ago

@abhishektcs1 @sanyoggupta Could either of you also share a full traceback of the error encountered (the entire error message, from the first lines), preferably as a copy-paste of the text rather than a screenshot please?

Hey, I am getting a similar error when I try out my code This is the Traceback:

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "/home/ksuresh6/DataChat_Project/model.py", line 20, in <module>
    model = load_checkpoint_and_dispatch(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hulab/ksuresh6/anaconda3/envs/datachat_env/lib/python3.11/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/data/hulab/ksuresh6/anaconda3/envs/datachat_env/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 982, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: decoder.transformer.h.7.attn.causal_mask doesn't have any device set.
(datachat_env) ksuresh6@AMD4RTX3090GPU14:~/DataChat_Project$ python3 model.py                                                                                                                            
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "/home/ksuresh6/DataChat_Project/model.py", line 20, in <module>
    model = load_checkpoint_and_dispatch(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/hulab/ksuresh6/anaconda3/envs/datachat_env/lib/python3.11/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/data/hulab/ksuresh6/anaconda3/envs/datachat_env/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 982, in load_checkpoint_in_model
    raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: decoder.transformer.h.7.attn.causal_mask doesn't have any device set.

This is the code I am trying out:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
from transformers import AutoConfig
from accelerate import init_empty_weights
from accelerate import load_checkpoint_and_dispatch
checkpoint = "Salesforce/instructcodet5p-16b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

model_path ='/home/ksuresh6/.cache/huggingface/hub/models--Salesforce--instructcodet5p-16b/snapshots/b5aaae8f54e8f13897e395fbc4c22567df0399ef'
tokenizer = AutoTokenizer.from_pretrained(model_path)
config = AutoConfig.from_pretrained(checkpoint,torch_dtype=torch.float16,low_cpu_mem_usage=True,trust_remote_code=True)
with init_empty_weights():
    model = AutoModelForSeq2SeqLM.from_config(config, trust_remote_code=True,torch_dtype=torch.float16)
model.tie_weights()

model = load_checkpoint_and_dispatch(
    model, model_path, device_map="auto"
)

inputs = tokenizer.encode("def print_hello():", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=12)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

This is the output of transformers-cli env:

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.26.1
- Platform: Linux-5.19.0-41-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.12.1
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA) 
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Yes

Any help is appreciated! Thanks in advance!

anujsahani01 commented 1 year ago

@abhishektcs1 @sanyoggupta Could either of you also share a full traceback of the error encountered (the entire error message, from the first lines), preferably as a copy-paste of the text rather than a screenshot please? Hii, i am facing the same issue, this is what i get after executing (! transformers-cli env)

please help me out with this problem. Thank You!

sgugger commented 1 year ago

@younesbelkada could this be the same bug you fixed on NLLB here? I see the no_split_module_class is also the attention layer.

younesbelkada commented 1 year ago

Hmm this sounds more like you are using the infer auto device map in an inappropriate way indeed. You should put "M2M100EncoderLayer" and "M2M100DecoderLayer" inside _no_split_modules. Could you try again with these new values? Also can you share us a handy reproducible snippet? 🙏

anujsahani01 commented 1 year ago

Thank You i got it. @sgugger you have posted a great documentation on hugging face on "how to run these large model on our device".

https://huggingface.co/blog/accelerate-large-models

anujsahani01 commented 1 year ago

Hmm this sounds more like you are using the infer auto device map in an inappropriate way indeed. You should put "M2M100EncoderLayer" and "M2M100DecoderLayer" inside _no_split_modules. Could you try again with these new values? Also can you share us a handy reproducible snippet? 🙏

doubt please help me out what values should i pass in no_split_modules Thank You!

anujsahani01 commented 1 year ago

these are the model layers.

younesbelkada commented 1 year ago

Hi @anujsahani01 Can you try to put GPTBigCodeBlock in no split modules?

anujsahani01 commented 1 year ago

Hi @anujsahani01 Can you try to put GPTBigCodeBlock in no split modules?

Yes it worked. Thank You!

anujsahani01 commented 1 year ago

Hi @anujsahani01 Can you try to put GPTBigCodeBlock in no split modules?

Hey, was having one more doubt if please me with this. I am finetuning hugging face “HuggingFaceH4/starchat-alpha” model for making a data science text to code generating bot. This is the format of my dataset: train: Dataset({ features: [‘input_ids’, ‘labels’], num_rows: 5012 }) test: Dataset({ features: [‘input_ids’, ‘labels’], num_rows: 1325 }) }) and the structure of the dataset looks somewhat like this, which was explained in starcoder documentation, <|system|> Below is a dialogue between a human and an ANUJ_AI <|end|> <|user|> Minimum count of ind… so on <|end|> <|assistant|> def possible ( x , S , N ) : …so on <|end|>

I am loading the model on my colab in 8 bit format using :hugs:transformer BitsAndBytesConfig for saving memory, then loaded the model using a device map which was made using :hugs: transformers AutoConfig and the acclerate which divided my model amoung ‘gpu’, ‘cpu’ RAM and my ‘disk’.

Once the model and its checkpoints were downloaded successfully then i used transformers.Trainer to train the model on my custom dataset. my using the below code:

but i am always getting this error : Cannot copy out of meta tensor ; no data !

Your inputs will be highly appreciated. Thank You!

younesbelkada commented 1 year ago

Hi @anujshani01 Thanks! Could you explain a bit more in details how you train the 8bit model? Are you sure you are using adapters leveraging PEFT library? Maybe if you can share the full snippet I can help you more on that! 💪

anujsahani01 commented 1 year ago

Hi @anujshani01 Thanks! Could you explain a bit more in details how you train the 8bit model? Are you sure you are using adapters leveraging PEFT library? Maybe if you can share the full snippet I can help you more on that! 💪

i have updates the colab notebook. https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

i am not using 8bit model now. i am using 🤗tool " accelerate " to initializing the model then using load_checkpoint_and_dispatch i am loading the model weights and all. But its giving me this error: ValueError: offload is not a folder containing a .index.json file.

i am not able to understant what exactly the error is. please have a look at the snip which show the offload folder and error

Please help we out with this error it would be a great help. Your inputs will be highly appreciated. Thank You!

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers