Closed Xynonners closed 10 months ago
What happens when you independently initialise the UNet, prepare it, and then initialise the pipeline with the prepared UNet?
But why would you do this though?
Cc: @SunMarc
What happens when you independently initialise the UNet, prepare it, and then initialise the pipeline with the prepared UNet?
But why would you do this though?
Cc: @SunMarc
I haven't tested that actually, though I believe one or the other would get overwritten since it is fighting over an attr (depends on if the unet.config is created by pipeline or by unet).
Like I said, it is a pretty fringe usecase, since I'm currently reimplementing AlignProp (and doing some weird stuff).
What I'm trying to do currently to skirt the issue is to just capture the unet config before the prepare, then merge the configs after the prepare, though I'd consider it pretty hacky. None of the keys do conflict though however (in my accelerate config).
EDIT: tested and it seems like while initializing the unet earlier, it errors out in the same way as before.
Hi @Xynonners , thanks for reporting. I don't think that accelerate overwrite the config file, especially when it is one of the most important attribute in transformers. I've run the following code and everything works. LMK if it works on your side.
import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
project_dir="test",
automatic_checkpoint_naming=True,
total_limit=10,
)
accelerator = Accelerator(
log_with="aim",
mixed_precision="fp16",
project_config=accelerator_config,
gradient_accumulation_steps=16,
)
pipeline = StableDiffusionXLPipeline.from_single_file("https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/sd_xl_base_1.0_0.9vae.safetensors").to(0)
print(pipeline.unet.config)
pipeline.unet = accelerator.prepare(pipeline.unet)
print(pipeline.unet.config)
pipeline(prompt="")
Both print gives me the following output:
FrozenDict([('sample_size', 128), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D')), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D')), ('only_cross_attention', False), ('block_out_channels', (320, 640, 1280)), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('dropout', 0.0), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 2048), ('transformer_layers_per_block', [1, 2, 10]), ('encoder_hid_dim', None), ('encoder_hid_dim_type', None), ('attention_head_dim', [5, 10, 20]), ('num_attention_heads', None), ('dual_cross_attention', False), ('use_linear_projection', True), ('class_embed_type', None), ('addition_embed_type', 'text_time'), ('addition_time_embed_dim', 256), ('num_class_embeds', None), ('upcast_attention', None), ('resnet_time_scale_shift', 'default'), ('resnet_skip_time_act', False), ('resnet_out_scale_factor', 1.0), ('time_embedding_type', 'positional'), ('time_embedding_dim', None), ('time_embedding_act_fn', None), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', 2816), ('attention_type', 'default'), ('class_embeddings_concat', False), ('mid_block_only_cross_attention', None), ('cross_attention_norm', None), ('addition_embed_type_num_heads', 64), ('_use_default_values', ['act_fn', 'timestep_post_act', 'time_embedding_act_fn', 'dual_cross_attention', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'mid_block_scale_factor', 'freq_shift', 'encoder_hid_dim', 'attention_type', 'time_cond_proj_dim', 'norm_num_groups', 'only_cross_attention', 'mid_block_type', 'dropout', 'resnet_skip_time_act', 'mid_block_only_cross_attention', 'norm_eps', 'resnet_time_scale_shift', 'num_class_embeds', 'flip_sin_to_cos', 'conv_out_kernel', 'resnet_out_scale_factor', 'class_embeddings_concat', 'conv_in_kernel', 'center_input_sample', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'downsample_padding'])])
Hi @Xynonners , thanks for reporting. I don't think that accelerate overwrite the config file, especially when it is one of the most important attribute in transformers. I've run the following code and everything works. LMK if it works on your side.
import torch from accelerate.utils import ProjectConfiguration from accelerate import Accelerator from diffusers import StableDiffusionXLPipeline accelerator_config = ProjectConfiguration( project_dir="test", automatic_checkpoint_naming=True, total_limit=10, ) accelerator = Accelerator( log_with="aim", mixed_precision="fp16", project_config=accelerator_config, gradient_accumulation_steps=16, ) pipeline = StableDiffusionXLPipeline.from_single_file("https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/sd_xl_base_1.0_0.9vae.safetensors").to(0) print(pipeline.unet.config) pipeline.unet = accelerator.prepare(pipeline.unet) print(pipeline.unet.config) pipeline(prompt="")
Both print gives me the following output:
FrozenDict([('sample_size', 128), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D')), ('mid_block_type', 'UNetMidBlock2DCrossAttn'), ('up_block_types', ('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D')), ('only_cross_attention', False), ('block_out_channels', (320, 640, 1280)), ('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('dropout', 0.0), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05), ('cross_attention_dim', 2048), ('transformer_layers_per_block', [1, 2, 10]), ('encoder_hid_dim', None), ('encoder_hid_dim_type', None), ('attention_head_dim', [5, 10, 20]), ('num_attention_heads', None), ('dual_cross_attention', False), ('use_linear_projection', True), ('class_embed_type', None), ('addition_embed_type', 'text_time'), ('addition_time_embed_dim', 256), ('num_class_embeds', None), ('upcast_attention', None), ('resnet_time_scale_shift', 'default'), ('resnet_skip_time_act', False), ('resnet_out_scale_factor', 1.0), ('time_embedding_type', 'positional'), ('time_embedding_dim', None), ('time_embedding_act_fn', None), ('timestep_post_act', None), ('time_cond_proj_dim', None), ('conv_in_kernel', 3), ('conv_out_kernel', 3), ('projection_class_embeddings_input_dim', 2816), ('attention_type', 'default'), ('class_embeddings_concat', False), ('mid_block_only_cross_attention', None), ('cross_attention_norm', None), ('addition_embed_type_num_heads', 64), ('_use_default_values', ['act_fn', 'timestep_post_act', 'time_embedding_act_fn', 'dual_cross_attention', 'encoder_hid_dim_type', 'num_attention_heads', 'time_embedding_type', 'mid_block_scale_factor', 'freq_shift', 'encoder_hid_dim', 'attention_type', 'time_cond_proj_dim', 'norm_num_groups', 'only_cross_attention', 'mid_block_type', 'dropout', 'resnet_skip_time_act', 'mid_block_only_cross_attention', 'norm_eps', 'resnet_time_scale_shift', 'num_class_embeds', 'flip_sin_to_cos', 'conv_out_kernel', 'resnet_out_scale_factor', 'class_embeddings_concat', 'conv_in_kernel', 'center_input_sample', 'addition_embed_type_num_heads', 'cross_attention_norm', 'time_embedding_dim', 'downsample_padding'])])
Hi,
Did some more testing and it seems like there are three cases occuring.
As I am currently facing case 3, this is what I get in the second printout.
{'train_batch_size': 128, 'train_micro_batch_size_per_gpu': 4, 'gradient_accumulation_steps': 16, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'cpu', 'nvme_path': None}, 'offload_param': {'device': 'cpu', 'nvme_path': None}, 'stage3_gather_16bit_weights_on_model_save': False}, 'steps_per_print': inf, 'fp16': {'enabled': True, 'auto_cast': True}, 'bf16': {'enabled': False}, 'zero_allow_untested_optimizer': True}
Here is the modified testing script (I launch it with accelerate launch with 3 different configs).
import torch
from accelerate.utils import ProjectConfiguration
from accelerate import Accelerator
from diffusers import StableDiffusionXLPipeline
accelerator_config = ProjectConfiguration(
project_dir="test",
automatic_checkpoint_naming=True,
total_limit=10,
)
accelerator = Accelerator(
log_with="aim",
mixed_precision="fp16",
project_config=accelerator_config,
gradient_accumulation_steps=16,
)
if getattr(accelerator.state, "deepspeed_plugin", None):
accelerator.state.deepspeed_plugin.deepspeed_config['train_micro_batch_size_per_gpu'] = 4
pipeline = StableDiffusionXLPipeline.from_single_file("models/sd_xl_base_1.0_0.9vae.safetensors").to(accelerator.device)
optimizer = torch.optim.AdamW(
pipeline.unet.parameters(),
lr=1e-3,
betas=(0.9, 0.999),
weight_decay=0.1,
eps=1e-8,
)
print(pipeline.unet.config)
pipeline.unet, optimizer = accelerator.prepare(pipeline.unet, optimizer)
print(pipeline.unet.config)
pipeline(prompt="")
cc @muellerzr @pacman100 for deepspeed integration
In multi-GPU, you should be able to access it via unet.module.config
, this will be the same with DeepSpeed
as these are limitations of the framework it's done in. @Xynonners if you can give me the filename for your log error, what we can do probably in diffusers is come up with a more agnostic solution to find attrs in the original model if Accelerate has wrapped it in DDP or DeepSpeed
@muellerzr
the aformentioned config attr access comes from the SDXL pipeline at:
diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
To preface, this is possibly something only encountered through a very fringe use case.
Accelerate uses the .config method to store it's own internal logic dict (batch size, optimizer params, etc), overriding the one created by diffusers. This causes running the pipeline to fail after putting the unet through accelerator.prepare.
Reproduction
Logs
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
diffusers
version: 0.22.0.dev0Who can help?
@yiyxuxu @sayakpaul