CFG with full finetuning of Flux

mamad-sd commented 2 weeks ago

Hello,

I've ran a full finetuning on Flux using guidance_scale=1.0.

Now when I try to inference the model, I need to set the CFG scale > 1 because if I let it at 1 to disable it like I'm used to do with Flux, then the results are washed up.

I'm used to training loras with SimpleTuner and using guidance_scale=1 during training doesn't require setting CFG > 1 during inference. Is that a problem due to full finetuning or is it a bug ?

Here is my full training command : accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train.py ^ --pretrained_model_name_or_path "flux1-dev.safetensors" ^ --clip_l "clip_l.safetensors" --t5xxl "t5xxl_fp16.safetensors" --ae "ae.sft" ^ --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 ^ --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 ^ --dataset_config "dataset.toml" --output_dir "output" --output_name output_5e5 ^ --learning_rate 4e-5 --max_train_epochs 150 --sdpa --highvram --cache_text_encoder_outputs_to_disk --cache_latents_to_disk --save_every_n_epochs 10 ^ --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" ^ --timestep_sampling sigmoid --model_prediction_type raw --guidance_scale 1.0 ^ --enable_wildcard ^ --fused_backward_pass --double_blocks_to_swap 6 --cpu_offload_checkpointing --full_bf16

Here are two samples using Distilled CFG at 3.5 using Forge UI. The first one uses CFG at 1 and the second one uses CFG at 4. Sample CFG 1: 00005-461397015

Sample CFG 4: 00006-461397015

"A dressed table with a cup of coffee. On the cup of coffee is written "FLUX CFG". Very detailed, professional photography." xyz_grid-0005-1

Any help would be appreciated :)

Thanks

kohya-ss commented 2 weeks ago

I don't know about "Distilled CFG at 3.5", but a model trained with guidance_scale 1.0 should require a guidance scale of around 3.5 for inference, just like the original model.

mamad-sd commented 2 weeks ago

Thanks for answering @kohya-ss !

I'm talking about this parameter in forge : Capture d’écran (168)

When using the base model or a lora trained using SimpleTuner, I set the "Distilled CFG Scale" to 3.5 and "CFG Scale" to 1.

When using the Kohya finetuned model, the results are not good if "CFG Scale" is kept to 1, I must set it to 4 or 5 (like shown in my previous grid).

setothegreat commented 2 weeks ago

Can't confirm this with my own finetune (note: wasn't trained on latest update) using ComfyUI, as increasing the CFG just results in a lower image quality.

That being said, I have noticed something interesting:

Trying to use LoRAs with my finetune has been pretty much impossible, with the LoRA often just introducing small, subtle changes to the composition during image generation but not much else, and nothing that the LoRA was actually trained on. This is regardless of whether the LoRA is trained on Dev or the Finetune. The base model, Flux Dev, will run the LoRAs just fine by comparison.

However, increasing the CFG to 4 while the LoRA is loaded with the finetune does seem to result in the elements the LoRA was trained on starting to actually appear in the image; no where near as prominently as they do with Dev, but they're at least visible now.

mamad-sd commented 2 weeks ago

@setothegreat can you share your training command or parameters used please ?

setothegreat commented 2 weeks ago

accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train.py --pretrained_model_name_or_path D:\FluxKohya\sd-scripts\models\flux1-dev.sft --clip_l D:\FluxKohya\sd-scripts\models\clip_l.safetensors --t5xxl D:\FluxKohya\sd-scripts\models\t5xxl_fp16.safetensors --ae D:\FluxKohya\sd-scripts\models\ae.sft --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 44 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --dataset_config D:\FluxKohya\sd-scripts\fluxdataset.toml --output_dir F:\ComfyUI\models\unet\ --output_name Flux-Test3 --learning_rate 25e-6 --max_train_epochs 8 --sdpa --highvram --cache_text_encoder_outputs_to_disk --cache_latents_to_disk --save_every_n_epochs 1 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --lr_scheduler constant --timestep_sampling shift --model_prediction_type raw --discrete_flow_shift 3.1582 --guidance_scale 1.0 --loss_type l2 --fused_backward_pass --double_blocks_to_swap 6 --cpu_offload_checkpointing --full_bf16 --masked_loss --t5xxl_max_token_length 512

Dataset toml was the same as the one provided in the readme, only with lower batch size, increased repeats for certain parts of the dataset, and the masked training folder specified.

kohya-ss commented 2 weeks ago

When using the Kohya finetuned model, the results are not good if "CFG Scale" is kept to 1, I must set it to 4 or 5 (like shown in my previous grid).

Thank you for clarification. "Distilled CFG Scale" seems to be "guidance scale" in BFL's definition, so "3.5" may be fine. From my understanding, "CFG Scale" is used for classifier free guidance, which means the guidance of negative prompt.

I don't know why, maybe dev distillation is about to break. Can you tell me your batch size and total number of steps?

kohya-ss / sd-scripts

CFG with full finetuning of Flux #1527