Open rockerBOO opened 4 months ago
Maybe post your command line? I got the alpha mask parameter working, so you could compare yours to mine:
accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train.py" --pretrained_model_name_or_path="/home/ara/Documents/Dev/sdxl/training/earthscape/kohya/dreambooth/earthscape-step00002600.safetensors" --sdpa --enable_bucket --min_bucket_reso=64 --max_bucket_reso=1024 --train_data_dir="/home/ara/Documents/Dev/sdxl/training/earthscape/kohya/img" --resolution="1024,1024" --output_dir="/home/ara/Documents/Dev/sdxl/training/earthscape/kohya/dreambooth" --logging_dir="/home/ara/Documents/Dev/sdxl/training/earthscape/kohya/log" --save_model_as=safetensors --vae="/home/ara/Documents/Dev/sdxl/sdxl_vae.safetensors" --output_name="earthscape" --lr_scheduler_num_cycles="20000" --max_token_length=150 --max_data_loader_n_workers="0" --lr_scheduler="constant_with_warmup" --lr_warmup_steps="100" --max_train_steps="16000" --caption_extension=".txt" --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False --max_data_loader_n_workers="0" --max_token_length=150 --bucket_reso_steps=32 --v_pred_like_loss="0.5" --save_every_n_steps="200" --save_last_n_steps="600" --min_snr_gamma=5 --gradient_checkpointing --xformers --bucket_no_upscale --noise_offset=0.0357 --adaptive_noise_scale=0.00357 --sample_sampler=k_dpm_2 --sample_prompts="/home/ara/Documents/Dev/sdxl/training/earthscape/kohya/dreambooth/sample/prompt.txt" --sample_every_n_steps="50" --fused_backward_pass --cache_latents --loss_type=huber --train_batch_size="4" --train_text_encoder --learning_rate_te1 1e-9 --learning_rate_te2 0 --learning_rate="4e-7" --flip_aug --enable_wildcard --shuffle_caption --alpha_mask
On a different topic, I wouldn't have high hopes for using background removal (i.e. with a tight boundary with the person you're training on) with alpha_mask. The problem I find is that since the background could be anything, the trained network starts generating multiple extra legs etc. as that raises the chance of getting a leg in the unmasked region, and the extra legs are not trained away by the gradient, as they end up in the masked background regions.
Best to leave some background unmasked in the area around the person, especially in the lower part of the image where the arms and legs are.
My current config. File paths and such are separately set. Using lycoris but i wouldn't think that would affect it.
pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
max_train_epochs=4
save_every_n_epochs=1
train_batch_size=1
gradient_accumulation_steps=2
sample_every_n_epochs=1
sample_sampler="dpmsolver++"
caption_extension=".txt"
ip_noise_gamma=0.1
noise_offset=0.1
adaptive_noise_scale=0.01
noise_offset_random_strength=true
ip_noise_gamma_random_strength=true
gradient_checkpointing=true
alpha_mask=true
network_dim=10
network_alpha=5
network_module = "lycoris.kohya"
network_args = [
"algo=boft",
"rescale=True",
"constrain=1e-4",
"dropout=0.3",
"rank_dropout=0.15",
"module_dropout=0.15",
]
debiased_estimation_loss=true
sdpa=true
seed=13337
save_model_as="safetensors"
training_comment="Trained by: rockerBOO"
mixed_precision="fp16"
optimizer_type="PagedAdamW32Bit"
unet_lr=1e-4
text_encoder_lr=5e-5
optimizer_args=["weight_decay=0.01", "betas=(0.9,0.999)"]
loss_type="huber"
huber_schedule="snr" # exponential, constant, or snr.
huber_c=0.1
log_with = "wandb"
Dataset config:
[general]
shuffle_caption = true
caption_extension = '.txt'
enable_bucket = true
bucket_reso_steps = 64
[[datasets]]
resolution = 768
...
In terms of the masking I do the automated rembg
mask and then manually filter out bad results. I have gotten very good results (in some cases the best results) with my current test but the backgrounds are not great. I think mixing in non-alpha results could prove to be even better. I certainly would iterate on my masking technique but for this current test it balances how fast it is, with decent to great results.
Though at this point just curious if I'm messing it up by not adding that additional channel. But also why it is not working correctly. Maybe something SD1.5 related?
I checked it and also it doesn't work because of the dimension. Alpha_mask already has 3 dimensions (1 x W x H) and doesn't need to be unsqueezed.
Also, there is a problem when using flip_arg with cache_latents. The conversion using transforms.ToTensor() is missing in cache_batch_latents, so the training failed as follows:
File "E:\LoRA-Scripts\library\train_util.py", line 1207, in __getitem__
alpha_mask = None if image_info.alpha_mask is None else torch.flip(image_info.alpha_mask, [1])
TypeError: flip(): argument 'input' (position 1) must be Tensor, not numpy.ndarray
Add alpha_mask = transforms.ToTensor()(alpha_mask)
to fix.
So, here are all the fixes:
diff --git a/library/custom_train_functions.py b/library/custom_train_functions.py
index 2a513dc..37680b1 100644
--- a/library/custom_train_functions.py
+++ b/library/custom_train_functions.py
@@ -487,7 +487,7 @@ def apply_masked_loss(loss, batch):
# print(f"conditioning_image: {mask_image.shape}")
elif "alpha_masks" in batch and batch["alpha_masks"] is not None:
# alpha mask is 0 to 1
- mask_image = batch["alpha_masks"].to(dtype=loss.dtype).unsqueeze(1) # add channel dimension
+ mask_image = batch["alpha_masks"].to(dtype=loss.dtype) # add channel dimension
# print(f"mask_image: {mask_image.shape}, {mask_image.mean()}")
else:
return loss
diff --git a/library/train_util.py b/library/train_util.py
index 1f9f3c5..5795f86 100644
--- a/library/train_util.py
+++ b/library/train_util.py
@@ -2498,6 +2498,7 @@ def cache_batch_latents(
alpha_mask = alpha_mask.astype(np.float32) / 255.0
else:
alpha_mask = np.ones_like(image[:, :, 0], dtype=np.float32)
+ alpha_mask = transforms.ToTensor()(alpha_mask)
else:
alpha_mask = None
alpha_masks.append(alpha_mask)
Thank you for reporting the issue. It was due to the confusion between ndarray and Tensor. Sorry for the lack of testing. It should work in all combinations with and without cache, with and without disk cache, with and without flip.
When using --alpha_mask with images with the background removed using
rembg
. On commit 0d96e10b3e66d5c6c7096fbeb7626c5be2e98809uncommented print line for context:
If I swap the following lines it works.
It does seem to work, to a degree, with the lines swapped.
I have seen others get it to work without having to modify this line so maybe some interaction with the dataset and the alpha_mask. Would be happy to try to isolate this.