Open Moonlight63 opened 3 months ago
Hi @Moonlight63 - thanks for the details you provided.
The entire setup process looks good to me. The one main thing I would try different is to set the resolution parameter to 512 and see. I understand that your training images re 2048 resolution, but we noticed in our experiments that training sliders on a lower resolution (lower than the model's default) helps a lot.
Let me know if that helps
I've been reading through some of the other issues here trying to learn what I can about how this works, and the most helpful comments I have seen so far is to simply adjust LR for bad generations, and someone pointing to a copier LoRa method that I hadn't seen before. I had a though experiment about wanting to train a lora for a concept/facial feature that a base model wouldn't have a pre-existing reference for. I decided I want to try to generate a different looking nose for generic faces for making non-human characters that are consistent and controllable. Visual sliders seemed perfect because I could inpaint an original image to get the pairs, then train on the difference. I tried this, but the resulting lora seems to have zero effect at all. Wondering why. It looks like I am not the only one having this issue: https://github.com/rohitgandikota/sliders/issues/60 But that slider is relying on text prompting it looks like.
I tried creating models both with and without adding a prompt, eg:
and:
And here are the config params (basically all default, but tried more training steps after it didn't work, also saw a recommendation somewhere to use full attention if it was a difficult concept):
I set the resolution to 2048 because thats what my training images are. I created base images, then inpainted the noses with a lora of Voldemort. My theroy is that if I can get this to work, I can create any kind of facial features, or really any concept, in 3D, and transfer them to SDXL models.
Here are 2 of the pairs of training images:
I used dynamic prompts to create a few hundred random images with different age, eye size/color, hair color, skin tones, male/female, backgrounds, and distance from camera. Then I picked a few of the best ones and inpainted them.
The result is.... nothing. portrait photo of a blonde woman:
portrait photo of a blonde woman \<lora:flatnose3_alpha1.0_rank4_full_last:1>:
I have also tried large swings in the strength of the lora with no change.
This is pretty confusing as I would expect the lora to have some effect after training on something, but nothing happened. I've tried different settings, nothing I have done is working. I am starting to wonder if the lora itself is bugged.
I should mention that for generation, I dropped the lora into A1111. Nothing else. I've seen some people mentioning using an extension to keyframe strength over steps, but since I am getting no change at all, probably not going to help.
I haven't tried using lierla for the network, or changing the noise scheduler, I usually use Euler in A1111 for generation, but I don't know if that makes a difference.
Is anyone else have problems training this way? Any pointers? I am interested in trying other visual concepts.
EDIT: Here is the command I am using for training btw. The README left a few things out, but I think I did this correctly, but with things not working maybe someone can sanity check me:
The readme says to create 2 folders, smallsize and bigsize, under the 'folder_main'. I then figured out that at some point that must have changed to allow for in between values, so the folder names don't matter. So I set the folders smallsize and bigsize to -1 and 1. All of the base 'regular nose' images are in the 'smallsize' folder, and all of the 'flat nose' images are in the bigsize folder.