Open DonggeunYu opened 1 week ago
Hi @DonggeunYu, thanks for reporting!
We'll look into it. Out of interest, how did you discover this? Was is modifying the tests, or are the tests just an easy way to demonstrate this behaviour?
Easy way to demonstrate this behavior. While using a private model, I discovered that there was a problem with nn.Embedding.
I may be wrong, as I still need to understand the transformers and accelerate code fully.
When offload is used, it becomes a meta device during the init process. The weight of nn.Embedding created in __init__
becomes the meta device. If i use nn.Embedding
callable, the pre_forward
hook of accelerate will match the device of args, kwrags, and embedding.
However, because the embedding weight is inserted into the forward of another module, it enters the pre_forward hook as a meta device.
To prove this, the log that the pre_forward
of accelerate hook.py outputs the module device and the device of the args.
Until the nn.embedding
weight in the problem, the module device is meta device, and the args device is cuda. If the nn.embedding
weight in the problem enters another module's args, the module device is meta and the args device is meta (embedding weight). An error occurs when performing meta to cuda using send_to_device (args, self.execution_device)
.
module.__class__.__name__, device of module, device of args
Linear [device(type='meta')] [device(type='cuda', index=0)]
LayerNorm [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='cuda', index=0)]
LayerNorm [device(type='meta')] [device(type='cuda', index=0)]
Linear [device(type='meta')] [device(type='meta')]
@DonggeunYu Thanks for the update. Indeed, the structure of using the embedding weights rather than the layer in the forward pass is quite odd. cc @muellerzr who knows more about the pre_forward
hook of accelerate
@amyeroberts @muellerz How is the progress?
cc @SunMarc
System Info
transformers
version: 4.39.0Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
max_memory = {0: max_size // 2, "cpu": max_size * 2}
https://github.com/huggingface/transformers/blob/main/tests/test_modeling_common.py#L30852python3 -m pytest tests/models/deformable_detr/test_modeling_deformable_detr.py::DeformableDetrModelTest::test_disk_offload_safetensors
Expected behavior