accelerator.prepare overwrites .config method in unet (unet.config)

Xynonners commented 12 months ago

Describe the bug

To preface, this is possibly something only encountered through a very fringe use case.

Accelerate uses the .config method to store it's own internal logic dict (batch size, optimizer params, etc), overriding the one created by diffusers. This causes running the pipeline to fail after putting the unet through accelerator.prepare.

Reproduction

import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
    project_dir="test",
    automatic_checkpoint_naming=True,
    total_limit=10,
)
accelerator = Accelerator(
    log_with="aim",
    mixed_precision="fp16",
    project_config=accelerator_config,
    gradient_accumulation_steps=16,
)
pipeline = StableDiffusionXLPipeline.from_single_file("models/sd_xl_base_1.0_0.9vae.safetensors")
pipeline.unet = accelerator.prepare(pipeline.unet)
pipeline(prompt="")

Logs

│   756 timesteps = self.scheduler.timesteps                                                  │
│   757                                                                                       │
│   758    # 5. Prepare latent variables                                                      │
│ ❱ 759    num_channels_latents = self.unet.config.in_channels                                │
│   760    latents = self.prepare_latents(                                                    │
│   761        batch_size * num_images_per_prompt,                                            │
│   762        num_channels_latents,                                                          
AttributeError: 'dict' object has no attribute 'in_channels'

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

diffusers version: 0.22.0.dev0
Platform: Linux-6.5.6-273-tkg-tt-x86_64-with-glibc2.38
Python version: 3.11.5
PyTorch version (GPU?): 2.1.0+cu121 (True)
Huggingface_hub version: 0.17.3
Transformers version: 4.34.1
Accelerate version: 0.24.0
xFormers version: not installed
Using GPU in script?: yes, 2x
Using distributed or parallel set-up in script?: DeepSpeed ZeRO stage-2

Who can help?

@yiyxuxu @sayakpaul

sayakpaul commented 12 months ago

What happens when you independently initialise the UNet, prepare it, and then initialise the pipeline with the prepared UNet?

But why would you do this though?

Cc: @SunMarc

Xynonners commented 12 months ago

What happens when you independently initialise the UNet, prepare it, and then initialise the pipeline with the prepared UNet?

But why would you do this though?

Cc: @SunMarc

I haven't tested that actually, though I believe one or the other would get overwritten since it is fighting over an attr (depends on if the unet.config is created by pipeline or by unet).

Like I said, it is a pretty fringe usecase, since I'm currently reimplementing AlignProp (and doing some weird stuff).

What I'm trying to do currently to skirt the issue is to just capture the unet config before the prepare, then merge the configs after the prepare, though I'd consider it pretty hacky. None of the keys do conflict though however (in my accelerate config).

EDIT: tested and it seems like while initializing the unet earlier, it errors out in the same way as before.

SunMarc commented 12 months ago

Hi @Xynonners , thanks for reporting. I don't think that accelerate overwrite the config file, especially when it is one of the most important attribute in transformers. I've run the following code and everything works. LMK if it works on your side.

import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
    project_dir="test",
    automatic_checkpoint_naming=True,
    total_limit=10,
)
accelerator = Accelerator(
    log_with="aim",
    mixed_precision="fp16",
    project_config=accelerator_config,
    gradient_accumulation_steps=16,
)
pipeline = StableDiffusionXLPipeline.from_single_file("https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/sd_xl_base_1.0_0.9vae.safetensors").to(0)
print(pipeline.unet.config)
pipeline.unet = accelerator.prepare(pipeline.unet)
print(pipeline.unet.config)
pipeline(prompt="")

Both print gives me the following output:

FrozenDict([('sample_size', 128), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D')), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D')), ('only_cross_attention', False), ('block_out_channels', (320, 640, 1280)), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('dropout', 0.0), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 2048), ('transformer_layers_per_block', [1, 2, 10]), ('encoder_hid_dim', None), ('encoder_hid_dim_type', None), ('attention_head_dim', [5, 10, 20]), ('num_attention_heads', None), ('dual_cross_attention', False), ('use_linear_projection', True), ('class_embed_type', None), ('addition_embed_type', 'text_time'), ('addition_time_embed_dim', 256), ('num_class_embeds', None), ('upcast_attention', None), ('resnet_time_scale_shift', 'default'), ('resnet_skip_time_act', False), ('resnet_out_scale_factor', 1.0), ('time_embedding_type', 'positional'), ('time_embedding_dim', None), ('time_embedding_act_fn', None), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', 2816), ('attention_type', 'default'), ('class_embeddings_concat', False), ('mid_block_only_cross_attention', None), ('cross_attention_norm', None), ('addition_embed_type_num_heads', 64), ('_use_default_values', ['act_fn', 'timestep_post_act', 'time_embedding_act_fn', 'dual_cross_attention', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'mid_block_scale_factor', 'freq_shift', 'encoder_hid_dim', 'attention_type', 'time_cond_proj_dim', 'norm_num_groups', 'only_cross_attention', 'mid_block_type', 'dropout', 'resnet_skip_time_act', 'mid_block_only_cross_attention', 'norm_eps', 'resnet_time_scale_shift', 'num_class_embeds', 'flip_sin_to_cos', 'conv_out_kernel', 'resnet_out_scale_factor', 'class_embeddings_concat', 'conv_in_kernel', 'center_input_sample', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'downsample_padding'])])

Xynonners commented 11 months ago

Hi @Xynonners , thanks for reporting. I don't think that accelerate overwrite the config file, especially when it is one of the most important attribute in transformers. I've run the following code and everything works. LMK if it works on your side.

import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
    project_dir="test",
    automatic_checkpoint_naming=True,
    total_limit=10,
)
accelerator = Accelerator(
    log_with="aim",
    mixed_precision="fp16",
    project_config=accelerator_config,
    gradient_accumulation_steps=16,
)
pipeline = StableDiffusionXLPipeline.from_single_file("https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/sd_xl_base_1.0_0.9vae.safetensors").to(0)
print(pipeline.unet.config)
pipeline.unet = accelerator.prepare(pipeline.unet)
print(pipeline.unet.config)
pipeline(prompt="")

Both print gives me the following output:

FrozenDict([('sample_size', 128), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D')), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D')), ('only_cross_attention', False), ('block_out_channels', (320, 640, 1280)), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('dropout', 0.0), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 2048), ('transformer_layers_per_block', [1, 2, 10]), ('encoder_hid_dim', None), ('encoder_hid_dim_type', None), ('attention_head_dim', [5, 10, 20]), ('num_attention_heads', None), ('dual_cross_attention', False), ('use_linear_projection', True), ('class_embed_type', None), ('addition_embed_type', 'text_time'), ('addition_time_embed_dim', 256), ('num_class_embeds', None), ('upcast_attention', None), ('resnet_time_scale_shift', 'default'), ('resnet_skip_time_act', False), ('resnet_out_scale_factor', 1.0), ('time_embedding_type', 'positional'), ('time_embedding_dim', None), ('time_embedding_act_fn', None), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', 2816), ('attention_type', 'default'), ('class_embeddings_concat', False), ('mid_block_only_cross_attention', None), ('cross_attention_norm', None), ('addition_embed_type_num_heads', 64), ('_use_default_values', ['act_fn', 'timestep_post_act', 'time_embedding_act_fn', 'dual_cross_attention', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'mid_block_scale_factor', 'freq_shift', 'encoder_hid_dim', 'attention_type', 'time_cond_proj_dim', 'norm_num_groups', 'only_cross_attention', 'mid_block_type', 'dropout', 'resnet_skip_time_act', 'mid_block_only_cross_attention', 'norm_eps', 'resnet_time_scale_shift', 'num_class_embeds', 'flip_sin_to_cos', 'conv_out_kernel', 'resnet_out_scale_factor', 'class_embeddings_concat', 'conv_in_kernel', 'center_input_sample', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'downsample_padding'])])

Hi,

Did some more testing and it seems like there are three cases occuring.

It works perfectly normal when launching with a single GPU without DistributedDataParallel.
Using multiple GPUs, it doesn't work as DistributedDataParallel masks the underlying pipeline.unet.config FrozenDict.
Using multiple GPUs + DeepSpeed, it doesn't work as something between DeepSpeed <> Accelerate (most likely DeepSpeed) replaces the pipeline.unet.config with the DeepSpeed config.

As I am currently facing case 3, this is what I get in the second printout.

{'train_batch_size': 128, 'train_micro_batch_size_per_gpu': 4, 'gradient_accumulation_steps': 16, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'cpu', 'nvme_path': None}, 'offload_param': {'device': 'cpu', 'nvme_path': None}, 'stage3_gather_16bit_weights_on_model_save': False}, 'steps_per_print': inf, 'fp16': {'enabled': True, 'auto_cast': True}, 'bf16': {'enabled': False}, 'zero_allow_untested_optimizer': True}

Here is the modified testing script (I launch it with accelerate launch with 3 different configs).

import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
    project_dir="test",
    automatic_checkpoint_naming=True,
    total_limit=10,
)
accelerator = Accelerator(
    log_with="aim",
    mixed_precision="fp16",
    project_config=accelerator_config,
    gradient_accumulation_steps=16,
)
if getattr(accelerator.state, "deepspeed_plugin", None):
    accelerator.state.deepspeed_plugin.deepspeed_config['train_micro_batch_size_per_gpu'] = 4
pipeline = StableDiffusionXLPipeline.from_single_file("models/sd_xl_base_1.0_0.9vae.safetensors").to(accelerator.device)
optimizer = torch.optim.AdamW(
    pipeline.unet.parameters(),
    lr=1e-3,
    betas=(0.9, 0.999),
    weight_decay=0.1,
    eps=1e-8,
)
print(pipeline.unet.config)
pipeline.unet, optimizer = accelerator.prepare(pipeline.unet, optimizer)
print(pipeline.unet.config)
pipeline(prompt="")

SunMarc commented 11 months ago

cc @muellerzr @pacman100 for deepspeed integration

muellerzr commented 11 months ago

In multi-GPU, you should be able to access it via unet.module.config, this will be the same with DeepSpeed as these are limitations of the framework it's done in. @Xynonners if you can give me the filename for your log error, what we can do probably in diffusers is come up with a more agnostic solution to find attrs in the original model if Accelerate has wrapped it in DDP or DeepSpeed

Xynonners commented 11 months ago

@muellerzr

the aformentioned config attr access comes from the SDXL pipeline at: diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers