NotImplementedError: Cannot copy out of meta tensor; no data when embedding to meta

DonggeunYu commented 1 week ago

System Info

transformers version: 4.39.0
Platform: Linux-5.4.0-81-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.2
Accelerate version: 0.31.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

To send query_position_embeddings to meta, modify it as follows: max_memory = {0: max_size // 2, "cpu": max_size * 2} https://github.com/huggingface/transformers/blob/main/tests/test_modeling_common.py#L30852
Run python3 -m pytest tests/models/deformable_detr/test_modeling_deformable_detr.py::DeformableDetrModelTest::test_disk_offload_safetensors

Expected behavior

FAILED tests/models/deformable_detr/test_modeling_deformable_detr.py::DeformableDetrModelTest::test_disk_offload_safetensors - NotImplementedError: Cannot copy out of meta tensor; no data!

amyeroberts commented 1 week ago

Hi @DonggeunYu, thanks for reporting!

We'll look into it. Out of interest, how did you discover this? Was is modifying the tests, or are the tests just an easy way to demonstrate this behaviour?

DonggeunYu commented 1 week ago

Easy way to demonstrate this behavior. While using a private model, I discovered that there was a problem with nn.Embedding.

DonggeunYu commented 1 week ago

I may be wrong, as I still need to understand the transformers and accelerate code fully. When offload is used, it becomes a meta device during the init process. The weight of nn.Embedding created in __init__ becomes the meta device. If i use nn.Embedding callable, the pre_forward hook of accelerate will match the device of args, kwrags, and embedding.

However, because the embedding weight is inserted into the forward of another module, it enters the pre_forward hook as a meta device. To prove this, the log that the pre_forward of accelerate hook.py outputs the module device and the device of the args. Until the nn.embedding weight in the problem, the module device is meta device, and the args device is cuda. If the nn.embedding weight in the problem enters another module's args, the module device is meta and the args device is meta (embedding weight). An error occurs when performing meta to cuda using send_to_device (args, self.execution_device).

module.__class__.__name__, device of module, device of args
Linear [device(type='meta')] [device(type='cuda', index=0)]
LayerNorm [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='cuda', index=0)]
LayerNorm [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='meta')]

def pre_forward of accelerate nn.Embedding of transformers

amyeroberts commented 1 week ago

@DonggeunYu Thanks for the update. Indeed, the structure of using the embedding weights rather than the layer in the forward pass is quite odd. cc @muellerzr who knows more about the pre_forward hook of accelerate

DonggeunYu commented 1 day ago

@amyeroberts @muellerz How is the progress?

muellerzr commented 1 day ago

cc @SunMarc

huggingface / transformers