I get some bad result by following your step

shoutOutYangJie commented 9 months ago

I download you provioded checkpoint, which contain Unet and subject-encoder. Then, I excute this comand line:

python inference_ss.py --model_name_or_path ./models/dreamtuner --subject_encoder_beta 0.2 --num_samples 10 --num_inference_steps 50 \ --reference_image ./datasets/sample/00006_rgb.png \ --mask_image ./datasets/sample/00006_mask.png \ --negative_prompt "worst quality, normal quality, low quality, low res, blurry, text, watermark, logo, banner, extra digits, cropped, jpeg artifacts, signature, username, error, sketch ,duplicate, ugly, monochrome, horror, geometry, mutation, disgusting" \ --prompt "best quality,1girl,outdoor" --enable_reference_guidance --reference_guidance_scale 2 --dtype float16

But I get these result:

I use "datasets/sample/00006_rgb.png" as reference image. The output image quality is worse than one generated by using "inference.py"

by the way, all of result is not like the reference image

At last, can you provide the script which make DreamBooth dataset.

kousw commented 9 months ago

Which model are you using? I used anthing v3 as my base model

kousw commented 9 months ago

Perhaps I omitted a setting, but model_index.json in models/dreamtuner should look like this

{
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.8.0.dev0",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "subject_encoder": [
    "dreamtuner.models.subject_encoder",
    "SubjectEncoder"
  ],
  "unet": [
    "dreamtuner.models.unet",
    "SDUNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}

In this state, running inference.py with subject_encoder_beta 0.5 gave the following

image_0_a

But running inference_ss.py does indeed yield a bad image! I think there must have been some mistake, I will investigate.

kousw commented 9 months ago

In inference_ss.py, when reference guidance is disabled, it works as expected. However, when it is enabled, it seems to not work correctly.

Normally, when reference guidance is enabled, the style is strongly reflected. When I cleaned up the code, I think I broke some implementation related to this, but I haven't found it so far.

kousw commented 9 months ago

I found the problem and was able to fix it! This technique works only with DDIM Scheduler. If your SD1.5 based model has a different scheduler, it will not work well, please replace model_index.json with the following

{
    "_class_name": "DreamTunerPipelineSelfSubject",
    "_diffusers_version": "0.8.0.dev0",
    "feature_extractor": [
      "transformers",
      "CLIPImageProcessor"
    ],
    "safety_checker": [
      "stable_diffusion",
      "StableDiffusionSafetyChecker"
    ],
    "scheduler": [
      "diffusers",
      "DDIMScheduler"
    ],
    "text_encoder": [
      "transformers",
      "CLIPTextModel"
    ],
    "tokenizer": [
      "transformers",
      "CLIPTokenizer"
    ],
    "subject_encoder": [
      "dreamtuner.models.subject_encoder",
      "SubjectEncoder"
    ],
    "unet": [
      "dreamtuner.models.unet",
      "SDUNet2DConditionModel"
    ],
    "vae": [
      "diffusers",
      "AutoencoderKL"
    ],
    "unet_reference": [
      "dreamtuner.models.unet",
      "SDUNet2DConditionModel"
    ]
  }

It's not perfect, but this is what happens when you sit her in the reference image.

kousw / experimental-dreamtuner

I get some bad result by following your step #2