Cannot copy out of meta tensor; no data! for SwinV2ForImageClassification

ethvedbitdesjan commented 6 months ago

System Info

transformers version: 4.41.0.dev0
Platform: Linux-4.18.0-477.51.1.el8_8.ppc64le-ppc64le-with-glibc2.28
Python version: 3.9.18
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.26.1
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.0.1 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes

Who can help?

@amyeroberts

Error when using SwinV2 with device_map

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Basically just trying to load SwinV2forImageClassification on multiple GPUs

model_path= 'microsoft/swinv2-large-patch4-window12-192-22k'
labels = [0, 1]
model = AutoModelForImageClassification.from_pretrained(model_path, num_labels=len(labels), ignore_mismatched_sizes=True, device_map='auto')

I get the following error: `NotImplementedError Traceback (most recent call last) Cell In[8], line 3 1 model_path= 'microsoft/swinv2-large-patch4-window12-192-22k' 2 labels = [0, 1] ----> 3 model = AutoModelForImageClassification.from_pretrained(model_path, num_labels=len(labels), ignore_mismatched_sizes=True, device_map='auto')

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:563, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, *kwargs) 561 elif type(config) in cls._model_mapping.keys(): 562 model_class = _get_model_class(config, cls._model_mapping) --> 563 return model_class.from_pretrained( 564 pretrained_model_name_or_path, model_args, config=config, hub_kwargs, kwargs 565 ) 566 raise ValueError( 567 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n" 568 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}." 569 )

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/transformers/modeling_utils.py:3754, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, kwargs) 3752 device_map_kwargs["force_hooks"] = True 3753 if not is_fsdp_enabled() and not is_deepspeed_zero3_enabled(): -> 3754 dispatch_model(model, device_map_kwargs) 3756 if hf_quantizer is not None: 3757 hf_quantizer.postprocess_model(model)

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/accelerate/big_modeling.py:445, in dispatch_model(model, device_map, main_device, state_dict, offload_dir, offload_index, offload_buffers, skip_keys, preload_module_classes, force_hooks) 443 device = f"npu:{device}" 444 if device != "disk": --> 445 model.to(device) 446 else: 447 raise ValueError( 448 "You are trying to offload the whole model to the disk. Please use the disk_offload function instead." 449 )

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/transformers/modeling_utils.py:2701, in PreTrainedModel.to(self, *args, *kwargs) 2696 if dtype_present_in_args: 2697 raise ValueError( 2698 "You cannot cast a GPTQ model in a new dtype. Make sure to load the model using from_pretrained using the desired" 2699 " dtype by passing the correct torch_dtype argument." 2700 ) -> 2701 return super().to(args, **kwargs)

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/torch/nn/modules/module.py:1145, in Module.to(self, *args, **kwargs) 1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, 1142 non_blocking, memory_format=convert_to_format) 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) -> 1145 return self._apply(convert)

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn) 795 def _apply(self, fn): 796 for module in self.children(): --> 797 module._apply(fn) 799 def compute_should_use_set_data(tensor, tensor_applied): 800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): 801 # If the new tensor has compatible tensor type as the existing tensor, 802 # the current behavior is to change the tensor in-place using .data =, (...) 807 # global flag to let the user control whether they want the future 808 # behavior of overwriting the existing tensor or not.

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/torch/nn/modules/module.py:820, in Module._apply(self, fn) 816 # Tensors stored in modules are graph leaves, and we don't want to 817 # track autograd history of param_applied, so we have to use 818 # with torch.no_grad(): 819 with torch.no_grad(): --> 820 param_applied = fn(param) 821 should_use_set_data = compute_should_use_set_data(param, param_applied) 822 if should_use_set_data:

File ~/.conda/envs/spin_diffusion_project/lib/python3.9/site-packages/torch/nn/modules/module.py:1143, in Module.to..convert(t) 1140 if convert_to_format is not None and t.dim() in (4, 5): 1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, 1142 non_blocking, memory_format=convert_to_format) -> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

NotImplementedError: Cannot copy out of meta tensor; no data!`

I get a similar error with using max_memory too:

GPU_MAP = {0: "12GiB", 1: "12GiB", "cpu":"60GiB"}
model_path= 'microsoft/swinv2-large-patch4-window12-192-22k'
labels = [0, 1]
model = AutoModelForImageClassification.from_pretrained(model_path, num_labels=len(labels), ignore_mismatched_sizes=True, device_map='auto', max_memory=GPU_MAP)

Expected behavior

I wanted the model to load but I am not sure how to debug the error. I am trying to this on two Tesla V100 GPUs(the model fits on one but it does not work. Any help would be great.

LysandreJik commented 6 months ago

This seems linked to device map, cc @SunMarc

SunMarc commented 6 months ago

Hi @ethvedbitdesjan, thanks for reporting. This happens because you changed the number of labels, meaning that some weights are newly initialized as specified by the warning message:

Some weights of Swinv2ForImageClassification were not initialized from the model checkpoint at microsoft/swinv2-large-patch4-window12-192-22k and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([21841, 1536]) in the checkpoint and torch.Size([2, 1536]) in the model instantiated
- classifier.bias: found shape torch.Size([21841]) in the checkpoint and torch.Size([2]) in the model instantiated

For now, we don't support loading mismatched weights with device_map="auto". Since the model is not very big (1GB), I recommend you loading the model on one gpu for now. What do you mean it doesn't work on one gpu ? Also since you are modifying the model, you need to retrain the model. device_map="auto" is used for inference mainly.

ethvedbitdesjan commented 6 months ago

@SunMarc Thanks. It works on a single GPU. I was hoping it could work on two or more using device_map but I understand that device_map is mainly for inference. I have switched to using deepspeed for retraining the model.

huggingface / transformers