Is there a performance hit with the newest version?

Until just now I've been using an earlier version of the Lora-Trainer (approx. 232294d4598fe3325cd968ff3762c9e888607677 ). Using T4 in Colab the following would take me around 25-30 minutes:

[additional_network_arguments]
unet_lr = 0.5
text_encoder_lr = 0.5
network_dim = 32
network_alpha = 32
network_module = "networks.lora"

[optimizer_arguments]
learning_rate = 0.5
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
optimizer_type = "Prodigy"
optimizer_args = [ "decouple=True", "weight_decay=0.04", "betas=[0.9,0.999]", "d_coef=2", "use_bias_correction=True", "safeguard_warmup=True",]

[training_arguments]
max_train_steps = 2000
save_every_n_epochs = 1
save_last_n_epochs = 50
train_batch_size = 2
clip_skip = 1
min_snr_gamma = 5.0
weighted_captions = false
seed = 42
max_token_length = 225
xformers = true
lowram = true
max_data_loader_n_workers = 8
persistent_data_loader_workers = true
save_precision = "fp16"
mixed_precision = "fp16"
output_dir = "/content/drive/MyDrive/lora_training/output/test"
logging_dir = "/content/drive/MyDrive/lora_training/log"
output_name = "test"
log_prefix = "test"
save_state = false

[model_arguments]
v2 = false

[saving_arguments]
save_model_as = "safetensors"

[dreambooth_arguments]
prior_loss_weight = 1.0

[dataset_arguments]
cache_latents = true

I've now switched to the most recent version and executed pretty much the identical configuration with only the following changes (due to new defaults):

# ...
optimizer_args = [ "decouple=True", "weight_decay=0.01", "betas=[0.9,0.999]", "d_coef=2", "use_bias_correction=True", "safeguard_warmup=True",]
# ...

The same training process using T4 will now take approx. 50 minutes.

Is this to be expected?

hollowstrawberry / kohya-colab

Is there a performance hit with the newest version? #67