huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.68k stars 5.3k forks source link

[Advanced Dreambooth] Absolute paths for output_dir are not supported #6685

Open levi opened 9 months ago

levi commented 9 months ago

Describe the bug

Saving the model files currently prepends the args.output_dir to the file name to various files:

{args.output_dir}/{args.output_dir}_emb.safetensors {args.output_dir}/{args.output_dir}.safetensors

This breaks when using an absolute directory path for output_dir.

Reproduction

Run advanced dreambooth script with an absolute output path

Logs

No response

System Info

Latest repo version of diffusers

Who can help?

@linoytsaban

levi commented 9 months ago

Also, @linoytsaban do you have a repo that has a working example of a Lora with trained ti embeddings?

So far, I’ve been unable to load a Lora trained with this script using TI embeddings. I noticed the final validation generation also doesn’t load the trained embeddings, yet generates the subject just fine. So something is fishy…

just want to validate if it’s my code for loading inference or if there’s a bug in the training script.

linoytsaban commented 9 months ago

Hey @Levi! Here are some repos with working examples:

https://huggingface.co/linoyts/2000_ads https://huggingface.co/linoyts/web_y2k https://huggingface.co/multimodalart/medieval-animals-lora https://huggingface.co/multimodalart/apolinario-face-final

regarding the final validation generation - I'll look into it as it should also load the embeddings

levi commented 9 months ago

Cool, thanks!

I found the issues with the script:

  1. For the training loop epoch validation, the pipeline doesn't load the modified tokenizers.

  2. For the final validation, the embeddings need to be saved right after the Lora is saved, and it needs to be loaded in the same way into the validation pipeline as described in the model card readme.

I have these changes running locally and can confirm everything is now working great.

levi commented 9 months ago

Here are my local changes to help. Sorry, I don't have time to put up a PR myself!

                pipeline = StableDiffusionXLPipeline.from_pretrained(
                    args.pretrained_model_name_or_path,
                    vae=vae,
                    text_encoder=accelerator.unwrap_model(text_encoder_one),
                    text_encoder_2=accelerator.unwrap_model(text_encoder_two),
                    tokenizer=tokenizer_one,
                    tokenizer_2=tokenizer_two,
                    unet=accelerator.unwrap_model(unet),
                    revision=args.revision,
                    variant=args.variant,
                    torch_dtype=weight_dtype,
                    add_watermarker=False,
                )
        StableDiffusionXLPipeline.save_lora_weights(
            save_directory=args.output_dir,
            unet_lora_layers=unet_lora_layers,
            text_encoder_lora_layers=text_encoder_lora_layers,
            text_encoder_2_lora_layers=text_encoder_2_lora_layers,
        )

        if args.train_text_encoder_ti:
            embeddings_path = f"{args.output_dir}/learned_embeds.safetensors"
            embedding_handler.save_embeddings(embeddings_path)

        images = []
        if args.validation_prompt and args.num_validation_images > 0:
            # Final inference
            # Load previous pipeline
            vae = AutoencoderKL.from_pretrained(
                vae_path,
                subfolder="vae"
                if args.pretrained_vae_model_name_or_path is None
                else None,
                revision=args.revision,
                variant=args.variant,
                torch_dtype=weight_dtype,
            )
            pipeline = StableDiffusionXLPipeline.from_pretrained(
                args.pretrained_model_name_or_path,
                vae=vae,
                revision=args.revision,
                variant=args.variant,
                torch_dtype=weight_dtype,
                add_watermarker=False,
            )

            # We train on the simplified learning objective. If we were previously predicting a variance, we need the scheduler to ignore it
            scheduler_args = {}

            if "variance_type" in pipeline.scheduler.config:
                variance_type = pipeline.scheduler.config.variance_type

                if variance_type in ["learned", "learned_range"]:
                    variance_type = "fixed_small"

                scheduler_args["variance_type"] = variance_type

            pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
                pipeline.scheduler.config, **scheduler_args
            )

            # load attention processors
            pipeline.load_lora_weights(args.output_dir)

            if args.train_text_encoder_ti:
                # TODO: Populate ti keys from the actual tokens in the embedding handler
                ti_keys = ["<s0>", "<s1>"]
                state_dict = load_file(embeddings_path)

                pipeline.load_textual_inversion(
                    state_dict["clip_l"],
                    token=ti_keys,
                    text_encoder=pipeline.text_encoder,
                    tokenizer=pipeline.tokenizer,
                )
                pipeline.load_textual_inversion(
                    state_dict["clip_g"],
                    token=ti_keys,
                    text_encoder=pipeline.text_encoder_2,
                    tokenizer=pipeline.tokenizer_2,
                )

            # run inference
            pipeline = pipeline.to(accelerator.device)
            generator = (
                torch.Generator(device=accelerator.device).manual_seed(args.seed)
                if args.seed
                else None
            )
            images = [
                pipeline(
                    args.validation_prompt, num_inference_steps=50, generator=generator
                ).images[0]
                for _ in range(args.num_validation_images)
            ]
levi commented 9 months ago

@linoytsaban ok, I figured out why I couldn't get the lora/embeddings to work correctly with my inference code. Turns out adding a negative prompt completely breaks the output? The subject it sometimes returned, but more often a random scene is returned.

I never realized that negative conditions don't work with TI. Is there a known workaround for this?

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.