Loading T5 encoder separately with StableDiffusion3Pipeline causes meta tensor error on sequential/model cpu offload

Teriks commented 1 month ago

Describe the bug

Attempting to load an SD3 checkpoint which includes clips from a single file, and supplying a T5 encoder loaded separately causes a meta tensor data copy error when calling enable_sequential_cpu_offload() or enable_model_cpu_offload()

Reproduction


import diffusers
import transformers

import os

os.environ['HF_TOKEN'] = 'your token'

encoder = transformers.T5EncoderModel.from_pretrained('stabilityai/stable-diffusion-3-medium-diffusers', subfolder='text_encoder_3', variant='fp16')

print(encoder.device) # cpu

sd3 = diffusers.StableDiffusion3Pipeline.from_single_file(
    'https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips.safetensors',
    text_encoder_3=encoder)

print(encoder.device) # cpu

sd3.enable_sequential_cpu_offload()

output = sd3(prompt='test')

or


import diffusers
import transformers

import os

os.environ['HF_TOKEN'] = 'your token'

encoder = transformers.T5EncoderModel.from_pretrained('stabilityai/stable-diffusion-3-medium-diffusers', subfolder='text_encoder_3', variant='fp16')

print(encoder.device) # cpu

sd3 = diffusers.StableDiffusion3Pipeline.from_single_file(
    'https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips.safetensors',
    text_encoder_3=encoder)

print(encoder.device) # cpu

sd3.enable_model_cpu_offload()

output = sd3(prompt='test')

Logs

REDACT\venv\Scripts\python.exe REDACT\test.py 
Downloading shards: 100%|██████████| 2/2 [00:00<00:00, 1980.78it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00,  4.06s/it]
cpu
REDACT\venv\Lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Fetching 21 files: 100%|██████████| 21/21 [00:00<00:00, 19753.39it/s]
Loading pipeline components...:  67%|██████▋   | 6/9 [00:00<00:00, 25.25it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|██████████| 9/9 [00:19<00:00,  2.17s/it]
cpu
Traceback (most recent call last):
  File "REDACT\test.py", line 19, in <module>
    sd3.enable_sequential_cpu_offload()
  File "REDACT\venv\Lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1166, in enable_sequential_cpu_offload
    cpu_offload(model, device, offload_buffers=offload_buffers)
  File "REDACT\venv\Lib\site-packages\accelerate\big_modeling.py", line 200, in cpu_offload
    state_dict = {n: p.to("cpu") for n, p in model.state_dict().items()}
                     ^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!

Process finished with exit code 1

System Info

Windows Python 3.12.3 transformers 4.41.2 diffusers 0.29.0 accelerate 0.31.0

Who can help?

No response

Teriks commented 1 month ago

Pipeline is not callable in this configuration even without offloading *

Teriks commented 1 month ago

Disabling auto offloading of pipeline components on this line by removing the with context fixes the pipelines ability to function with this setup.

https://github.com/huggingface/diffusers/blob/963ee05d164192956e1bf3f157ee4da077460f9e/src/diffusers/loaders/single_file_utils.py#L1365

The other components such as the CLIP models being empty tensors initially is what causes the copy error when mixed with the additional model which starts on the CPU

this causes moving the entire pipeline with .to to fail as well.

The offloading in single_file_utils should probably happen conditionally in some way that accounts for user supplied modules being present(?)

Teriks commented 1 month ago

workaround:

import contextlib
import os
import diffusers
import diffusers.loaders
import diffusers.loaders.single_file_utils
import transformers

os.environ['HF_TOKEN'] = 'your token'

# monkey patch out the context manager
old_ctx_mgr = diffusers.loaders.single_file_utils.init_empty_weights
diffusers.loaders.single_file_utils.init_empty_weights = contextlib.nullcontext

encoder = transformers.T5EncoderModel.from_pretrained('stabilityai/stable-diffusion-3-medium-diffusers',
                                                      subfolder='text_encoder_3')
sd3 = diffusers.StableDiffusion3Pipeline.from_single_file(
    'https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips.safetensors',
    text_encoder_3=encoder)

# put it back
diffusers.loaders.single_file_utils.init_empty_weights = old_ctx_mgr

sd3.enable_sequential_cpu_offload()

sd3(prompt='test')

sayakpaul commented 1 month ago

Cc: @DN6

Teriks commented 1 month ago

This does not seem to be an issue in 0.29.1

huggingface / diffusers