huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.4k stars 5.26k forks source link

train_text_to_image.py | fp16 uses more memory than fp32 #3451

Closed morgankohler closed 1 year ago

morgankohler commented 1 year ago

Describe the bug

Setting --mixed_precision="fp16" takes more VRAM memory (26235MiB) than --mixed_precision="no" (25361MiB).

Reproduction

export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model"

export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="no" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model"

Logs

—--------------------------------------------------------------------------------------------

FP16

—---------------------------------------------------------------------------------------------

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `2`
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
        `--num_machines` was set to a value of `1`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/home/user/anaconda3/envs/pyenv/lib/python3.9/site-packages/accelerate/accelerator.py:260: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
/home/user/anaconda3/envs/pyenv/lib/python3.9/site-packages/accelerate/accelerator.py:260: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
05/16/2023 12:10:53 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: fp16

05/16/2023 12:10:53 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

{'dynamic_thresholding_ratio', 'clip_sample_range', 'prediction_type', 'sample_max_value', 'variance_type', 'thresholding'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
{'dual_cross_attention', 'addition_embed_type', 'num_class_embeds', 'time_embedding_dim', 'mid_block_only_cross_attention', 'timestep_post_act', 'resnet_time_scale_shift', 'use_linear_projection', 'conv_in_kernel', 'mid_block_type', 'addition_embed_type_num_heads', 'time_embedding_type', 'time_cond_proj_dim', 'encoder_hid_dim', 'resnet_out_scale_factor', 'upcast_attention', 'only_cross_attention', 'conv_out_kernel', 'cross_attention_norm', 'projection_class_embeddings_input_dim', 'resnet_skip_time_act', 'class_embed_type', 'class_embeddings_concat', 'time_embedding_act_fn'} was not found in config. Values will be initialized to default values.
{'dual_cross_attention', 'addition_embed_type', 'num_class_embeds', 'time_embedding_dim', 'mid_block_only_cross_attention', 'timestep_post_act', 'resnet_time_scale_shift', 'use_linear_projection', 'conv_in_kernel', 'mid_block_type', 'addition_embed_type_num_heads', 'time_embedding_type', 'time_cond_proj_dim', 'encoder_hid_dim', 'resnet_out_scale_factor', 'upcast_attention', 'only_cross_attention', 'conv_out_kernel', 'cross_attention_norm', 'projection_class_embeddings_input_dim', 'resnet_skip_time_act', 'class_embed_type', 'class_embeddings_concat', 'time_embedding_act_fn'} was not found in config. Values will be initialized to default values.
05/16/2023 12:11:05 - WARNING - datasets.builder - Found cached dataset parquet (/home/user/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 507.66it/s]
100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 599.10it/s]
05/16/2023 12:11:12 - INFO - __main__ - ***** Running training *****
05/16/2023 12:11:12 - INFO - __main__ -   Num examples = 833
05/16/2023 12:11:12 - INFO - __main__ -   Num Epochs = 143
05/16/2023 12:11:12 - INFO - __main__ -   Instantaneous batch size per device = 1
05/16/2023 12:11:12 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
05/16/2023 12:11:12 - INFO - __main__ -   Gradient Accumulation steps = 4
05/16/2023 12:11:12 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                       | 1/15000 [00:05<20:58:36,  5.03s/it, lr=1e-5, step_loss=0.087]05/16/2023 12:11:17 - INFO - torch.nn.parallel.distributed - Reducer buckets have been rebuilt in this iteration.
05/16/2023 12:11:17 - INFO - torch.nn.parallel.distributed - Reducer buckets have been rebuilt in this iteration.
Steps:   0%|                      | 23/15000 [00:39<6:26:13,  1.55s/it, lr=1e-5, step_loss=0.0398]

—---------------------------------------------------------------------------------------------

FP32

—---------------------------------------------------------------------------------------------

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `2`
                More than one GPU was found, enabling multi-GPU training.
                If this was unintended please pass in `--num_processes=1`.
        `--num_machines` was set to a value of `1`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/home/user/anaconda3/envs/pyenv/lib/python3.9/site-packages/accelerate/accelerator.py:260: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
/home/user/anaconda3/envs/pyenv/lib/python3.9/site-packages/accelerate/accelerator.py:260: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
05/16/2023 12:12:17 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: no

05/16/2023 12:12:17 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: no

{'clip_sample_range', 'dynamic_thresholding_ratio', 'sample_max_value', 'thresholding', 'variance_type', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
{'time_cond_proj_dim', 'resnet_out_scale_factor', 'upcast_attention', 'time_embedding_type', 'only_cross_attention', 'addition_embed_type', 'class_embed_type', 'resnet_skip_time_act', 'encoder_hid_dim', 'conv_in_kernel', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'cross_attention_norm', 'time_embedding_act_fn', 'dual_cross_attention', 'num_class_embeds', 'addition_embed_type_num_heads', 'timestep_post_act', 'mid_block_only_cross_attention', 'use_linear_projection', 'mid_block_type', 'time_embedding_dim', 'conv_out_kernel', 'class_embeddings_concat'} was not found in config. Values will be initialized to default values.
{'time_cond_proj_dim', 'resnet_out_scale_factor', 'upcast_attention', 'time_embedding_type', 'only_cross_attention', 'addition_embed_type', 'class_embed_type', 'resnet_skip_time_act', 'encoder_hid_dim', 'conv_in_kernel', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'cross_attention_norm', 'time_embedding_act_fn', 'dual_cross_attention', 'num_class_embeds', 'addition_embed_type_num_heads', 'timestep_post_act', 'mid_block_only_cross_attention', 'use_linear_projection', 'mid_block_type', 'time_embedding_dim', 'conv_out_kernel', 'class_embeddings_concat'} was not found in config. Values will be initialized to default values.
100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 531.73it/s]
05/16/2023 12:12:29 - WARNING - datasets.builder - Found cached dataset parquet (/home/user/.cache/huggingface/datasets/lambdalabs___parquet/lambdalabs--pokemon-blip-captions-10e3527a764857bd/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 378.65it/s]
05/16/2023 12:12:34 - INFO - __main__ - ***** Running training *****
05/16/2023 12:12:34 - INFO - __main__ -   Num examples = 833
05/16/2023 12:12:34 - INFO - __main__ -   Num Epochs = 143
05/16/2023 12:12:34 - INFO - __main__ -   Instantaneous batch size per device = 1
05/16/2023 12:12:34 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
05/16/2023 12:12:34 - INFO - __main__ -   Gradient Accumulation steps = 4
05/16/2023 12:12:34 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                      | 1/15000 [00:04<18:48:56,  4.52s/it, lr=1e-5, step_loss=0.0389]05/16/2023 12:12:39 - INFO - torch.nn.parallel.distributed - Reducer buckets have been rebuilt in this iteration.
05/16/2023 12:12:39 - INFO - torch.nn.parallel.distributed - Reducer buckets have been rebuilt in this iteration.
Steps:   0%|                       | 9/15000 [00:18<7:44:15,  1.86s/it, lr=1e-5, step_loss=0.0853]

System Info

patrickvonplaten commented 1 year ago

Hey @morgankohler,

Instead of setting the precision with mixed_precision, could you just set it via accelerate config instead?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.