rohitgandikota / sliders

Concept Sliders for Precise Control of Diffusion Models
https://sliders.baulab.info
MIT License
818 stars 67 forks source link

SDXL Visual Sliders training new concept has no effect at all #95

Open Moonlight63 opened 3 months ago

Moonlight63 commented 3 months ago

I've been reading through some of the other issues here trying to learn what I can about how this works, and the most helpful comments I have seen so far is to simply adjust LR for bad generations, and someone pointing to a copier LoRa method that I hadn't seen before. I had a though experiment about wanting to train a lora for a concept/facial feature that a base model wouldn't have a pre-existing reference for. I decided I want to try to generate a different looking nose for generic faces for making non-human characters that are consistent and controllable. Visual sliders seemed perfect because I could inpaint an original image to get the pairs, then train on the difference. I tried this, but the resulting lora seems to have zero effect at all. Wondering why. It looks like I am not the only one having this issue: https://github.com/rohitgandikota/sliders/issues/60 But that slider is relying on text prompting it looks like.

I tried creating models both with and without adding a prompt, eg:

- target: "" # what word for erasing the positive concept from
  positive: "" # concept to erase
  unconditional: "" # word to take the difference from the positive concept
  neutral: "" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 2048
  dynamic_resolution: false
  batch_size: 1

and:

- target: "nose" # what word for erasing the positive concept from
  positive: "nose, flat" # concept to erase
  unconditional: "nose" # word to take the difference from the positive concept
  neutral: "nose" # starting point for conditioning the target
  action: "enhance" # erase or enhance
  guidance_scale: 4
  resolution: 2048
  dynamic_resolution: false
  batch_size: 1

And here are the config params (basically all default, but tried more training steps after it didn't work, also saw a recommendation somewhere to use full attention if it was a difficult concept):

prompts_file: "trainscripts/imagesliders/data/prompts-xl.yaml"
pretrained_model:
  name_or_path: "stabilityai/stable-diffusion-xl-base-1.0" # you can also use .ckpt or .safetensors models
  v2: false # true if model is v2.x
  v_pred: false # true if model uses v-prediction
network:
  type: "c3lier" # or "c3lier" or "lierla"
  rank: 4
  alpha: 1.0
  training_method: "full" #xattn, noxattn, full
train:
  precision: "bfloat16"
  noise_scheduler: "ddim" # or "ddpm", "lms", "euler_a"
  iterations: 5000
  lr: 0.0002
  optimizer: "AdamW"
  lr_scheduler: "constant"
  max_denoising_steps: 50
save:
  name: "temp"
  path: "./models"
  per_steps: 500
  precision: "bfloat16"
logging:
  use_wandb: false
  verbose: false
other:
  use_xformers: true

I set the resolution to 2048 because thats what my training images are. I created base images, then inpainted the noses with a lora of Voldemort. My theroy is that if I can get this to work, I can create any kind of facial features, or really any concept, in 3D, and transfer them to SDXL models.

Here are 2 of the pairs of training images: 00290-2468369899 00196-1959257638 00040-3834321717 00154-778726337

I used dynamic prompts to create a few hundred random images with different age, eye size/color, hair color, skin tones, male/female, backgrounds, and distance from camera. Then I picked a few of the best ones and inpainted them.

The result is.... nothing. portrait photo of a blonde woman: 00040-999923733

portrait photo of a blonde woman \<lora:flatnose3_alpha1.0_rank4_full_last:1>: 00039-999923733

I have also tried large swings in the strength of the lora with no change.

This is pretty confusing as I would expect the lora to have some effect after training on something, but nothing happened. I've tried different settings, nothing I have done is working. I am starting to wonder if the lora itself is bugged.

I should mention that for generation, I dropped the lora into A1111. Nothing else. I've seen some people mentioning using an extension to keyframe strength over steps, but since I am getting no change at all, probably not going to help.

I haven't tried using lierla for the network, or changing the noise scheduler, I usually use Euler in A1111 for generation, but I don't know if that makes a difference.

Is anyone else have problems training this way? Any pointers? I am interested in trying other visual concepts.

EDIT: Here is the command I am using for training btw. The README left a few things out, but I think I did this correctly, but with things not working maybe someone can sanity check me:

python trainscripts/imagesliders/train_lora-scale-xl.py --name 'flatnose3' --rank 4 --alpha 1 --config_file 'trainscripts/imagesliders/data/config-xl.yaml' --folder_main './trainscripts/imagesliders/nose/' --folders="smallsize, bigsize" --scales="-1, 1"

The readme says to create 2 folders, smallsize and bigsize, under the 'folder_main'. I then figured out that at some point that must have changed to allow for in between values, so the folder names don't matter. So I set the folders smallsize and bigsize to -1 and 1. All of the base 'regular nose' images are in the 'smallsize' folder, and all of the 'flat nose' images are in the bigsize folder.

rohitgandikota commented 2 months ago

Hi @Moonlight63 - thanks for the details you provided.

The entire setup process looks good to me. The one main thing I would try different is to set the resolution parameter to 512 and see. I understand that your training images re 2048 resolution, but we noticed in our experiments that training sliders on a lower resolution (lower than the model's default) helps a lot.

Let me know if that helps