huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.02k stars 5.17k forks source link

Keep initializing controlnet weights from unet for a long time #5573

Closed OrangeSodahub closed 10 months ago

OrangeSodahub commented 10 months ago

Describe the bug

I use the given script to train controlnet however it stuck in initializing weights from unet for a long long time

And I watch the vram and notice that it load the weights again and again, e.g. memory used from 6.5GB to 7.6GB, and then fell down to 6.5GB, and rised to 7.6GB again, and keep repeating ... while the total memory > 100GB

Reproduction

!accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" \
 --output_dir="model_out" \
 --dataset_name=multimodalart/facesyntheticsspigacaptioned \
 --conditioning_image_column=spiga_seg \
 --image_column=image \
 --caption_column=image_caption \
 --resolution=512 \
 --learning_rate=1e-5 \
 --validation_image "./face_landmarks1.jpeg" "./face_landmarks2.jpeg" "./face_landmarks3.jpeg" \
 --validation_prompt "High-quality close-up dslr photo of man wearing a hat with trees in the background" "Girl smiling, professional dslr photograph, dark background, studio lights, high quality" "Portrait of a clown face, oil on canvas, bittersweet expression" \
 --train_batch_size=4 \
 --num_train_epochs=3 \
 --tracker_project_name="controlnet" \
 --enable_xformers_memory_efficient_attention \
 --checkpointing_steps=5000 \
 --validation_steps=5000 \
 --report_to wandb \
 --push_to_hub

Logs

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
10/29/2023 16:07:21 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'timestep_spacing', 'dynamic_thresholding_ratio', 'sample_max_value', 'variance_type', 'clip_sample_range', 'thresholding', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values.
{'resnet_skip_time_act', 'upcast_attention', 'time_cond_proj_dim', 'only_cross_attention', 'time_embedding_dim', 'cross_attention_norm', 'reverse_transformer_layers_per_block', 'timestep_post_act', 'resnet_time_scale_shift', 'resnet_out_scale_factor', 'addition_embed_type', 'conv_in_kernel', 'use_linear_projection', 'encoder_hid_dim_type', 'num_class_embeds', 'conv_out_kernel', 'class_embeddings_concat', 'attention_type', 'projection_class_embeddings_input_dim', 'class_embed_type', 'time_embedding_act_fn', 'encoder_hid_dim', 'dual_cross_attention', 'mid_block_only_cross_attention', 'dropout', 'num_attention_heads', 'addition_time_embed_dim', 'mid_block_type', 'transformer_layers_per_block', 'addition_embed_type_num_heads', 'time_embedding_type'} was not found in config. Values will be initialized to default values.
10/29/2023 16:07:23 - INFO - __main__ - Initializing controlnet weights from unet

System Info

VRAM > 100GB, GPU memory 24GB.

Who can help?

Hope some useful solutions! I'm new to this awesome thing. @sayakpaul @patrickvonplaten @yiyixuxu @DN6

sayakpaul commented 10 months ago

I am unable to reproduce this.

Could you ensure:

OrangeSodahub commented 10 months ago

Yes, I'm sure, it's so wierd and I'm seeking more possible reasons... While I found that the true stuck point is the make_train_dataset, while I'm using my customed dataset.

I'm so confused how to correctly load my own dataset, your instruction https://huggingface.co/docs/datasets/v2.0.0/en/dataset_script here seems not consistent with the training script train_controalnet.py, do I need to replace the load_dataset function with mine? And according to this instruction, I create a new datasete class, but no new load_dataset func, do I needd to create a new one? THANKS, I'm so confused.

OrangeSodahub commented 10 months ago

Here if I sepcify the local dataset dir, I didn't find any instructions about how I need to organize my dataset direction, what's more, if I want to use a new dataset class, no instructions about how to connect load_dataset and my new dataset class. https://github.com/huggingface/diffusers/blob/9135e54e768a59ddcf8ad18818d2ffe69ea3a32a/examples/controlnet/train_controlnet.py#L596-L599

OrangeSodahub commented 10 months ago

cc @sayakpaul I just wonder how to organize my local dataset so as to use load_dataset in train_controlnet.py

sayakpaul commented 10 months ago

Cc: @lhoestq for suggesting ways to organize the dataset locally to mimic ControlNet training.

Quentin, the online dataset reference is here: https://huggingface.co/datasets/fusing/fill50k

lhoestq commented 10 months ago

You need a dataset with three columns:

To have a dataset with these columns you can for example have a local directory with the same structure as fill50k.

We are also seeing if we can have an easy structure, like

train/metadata.csv
train/image0.png
train/image1.png
...
train/conditioning_image0.png
train/conditioning_image1.png
...

but so far the metadata.csv can only link to one image per example.

See this issue for more information: https://github.com/huggingface/datasets/issues/5760

OrangeSodahub commented 10 months ago

@lhoestq Thanks for your explaination. I agree with that it's defintely doable if I organize the local dataset totally same as the fill50k, but I want to know more about the deeper logic that your load_dataset process a local dataset folder, so that I could adjust my data according to my thinkings, what is, maybe there are multiple conditons per image.

All I want to do is to do like with the torch's dataset. Your load_dataset provided the local folder api but it's troublesome to findout the instructions.

lhoestq commented 10 months ago

The docs are here: https://huggingface.co/docs/datasets/image_dataset.

Then about the controlnet training in particular, I found the docs about the dataset here but imo we can add a mention to this

You need a dataset with three columns:

  • image of type Image
  • conditioning_image of type Image
  • text of type string

and maybe redirect to the datasets docs on how to create an image dataset

OrangeSodahub commented 10 months ago

Thanks, the first docs are useful to me

yiyixuxu commented 10 months ago

Hi @OrangeSodahub

is your issue resolved?

YiYi

OrangeSodahub commented 10 months ago

@yiyixuxu Not yet, I haven't had a chance to check I will add new comments if I have other problems

lhoestq commented 10 months ago

You should pass the path of the file in the repo instead of the URL :)

OrangeSodahub commented 10 months ago

@lhoestq After debugging I know need to directly assign the data_dir in split_dataset, however the unclear docs make it troublesome

OrangeSodahub commented 10 months ago

@lhoestq Here it can only fetch the dataset through url right? It doesn't provide the local data dir parameter.

def _split_generators(self, dl_manager):
    archive_path = dl_manager.download(_BASE_URL)
    split_metadata_paths = dl_manager.download(_METADATA_URLS)
    return [
        datasets.SplitGenerator(
            name=datasets.Split.TRAIN,
            gen_kwargs={
                "images": dl_manager.iter_archive(archive_path),
                "metadata_path": split_metadata_paths["train"],
            },
        ),
        datasets.SplitGenerator(
            name=datasets.Split.VALIDATION,
            gen_kwargs={
                "images": dl_manager.iter_archive(archive_path),
                "metadata_path": split_metadata_paths["test"],
            },
        ),
    ]
lhoestq commented 10 months ago

You can pass the paths to the dl_manager.download function

OrangeSodahub commented 10 months ago

@lhoestq Thanks. But I load the jsonl file through load_dataset, and the two funcs split_generators and generate_examples seem not be called at all, and for the script below:

def preprocess_train(examples):
    print(examples)

    image_transforms = transforms.Compose(
        [
            transforms.Resize(512, interpolation=transforms.InterpolationMode.BILINEAR),
            transforms.CenterCrop(512),
            transforms.ToTensor(),
            transforms.Normalize([0.5], [0.5]),
        ]
    )
    images = [image.convert("RGB") for image in examples["target"]]
    images = [image_transforms(image) for image in images]

    examples["pixel_values"] = images

    return examples

def main():
    dataset = load_dataset("./data/") # load jsonl files
    train_dataset = dataset["train"]
    print(len(train_dataset))
    print(train_dataset.column_names)

    # Set the training transforms
    train_dataset = dataset["train"].with_transform(preprocess_train)
    train_dataloader = torch.utils.data.DataLoader(train_dataset, shuffle=True, batch_size=1, num_workers=4)
    for data in train_dataloader:
        print(data)

if __name__ == "__main__":
    multiprocessing.freeze_support()
    main()

the example seems not load the image correctly, it still is a string. Each example is a line in jsonl file, is that right? I need to load the image mannually?

lhoestq commented 10 months ago

Make sure the dataset script file has a correct filename. It must be the same as the dataset repository/folder name to be used

OrangeSodahub commented 10 months ago

@lhoestq You mean the dataset folder must be at the same level as the new load dataset script? e.g.:

-- ...
|- mydataset.py
|- data
     |- train.jsonl
     |- val.jsonl
     |- test.jsonl
     |- train
           |- ...

But if I want to load data from another place, how should I do?

And, the key of image filename is fixed? I use target and source as my image paths' key to be loaded in jsonl file, if it doesn't work, how could I successfully uses my keys?

sayakpaul commented 10 months ago

This issue is becoming more of a data loading issue and seems to better off elsewhere. Do you mind continuing the discussion on the datasets repo instead?

engrmusawarali commented 7 months ago

@sayakpaul @patrickvonplaten , i want to train controlnet on stable diffusion inpaint, Can you please tell me how to copy the weights.