can not replicate the result - Githubissues

zjysteven / controlnet_tile

Workable training script for ControlNet tile

22 stars 1 forks source link

can not replicate the result #3

Open Alan-Han opened 2 weeks ago

Alan-Han commented 2 weeks ago

Hello Steven, First of all, I would like to express my gratitude for your detailed tutorial. I use your train script but can not replicate the result.Here is the difference of my train details:

I use LAION-Aesthetics-V2-6.5plus (474k train data) instead of laion-400m
I trained on 8x V100 GPU within 4.5 hours

The other training parameters are exactly the same as those shown in the README would you care to share some tips or kind advice? Many thanks my result is 20241028-174235

zjysteven commented 2 weeks ago

What is the --condition_resolution and --resolution?

Alan-Han commented 2 weeks ago

What is the --condition_resolution and --resolution?

--condition_resolution=64 --resolution=256 by the way, I use LAION-Aesthetics-V2-6.5plus dataset's original text as input text

zjysteven commented 2 weeks ago

From the log screenshot it seems that you are using Stable Diffusion 1.5, right? If that’s the case, you would need to set --resolution=512, which is the output resolution of SD1.5. Letting it output 256x256 images won’t work well.

Alan-Han commented 2 weeks ago

From the log screenshot it seems that you are using Stable Diffusion 1.5, right? If that’s the case, you would need to set --resolution=512, which is the output resolution of SD1.5. Letting it output 256x256 images won’t work well.

Sorry, the confusion might have been caused by the naming of my experiment.I use miniSD as you did. here is the complete training script.

MAX_STEPS=10000
LR=1e-5
BS=32
PROMPT_DROPOUT=0.05
trail_name="sd15_control_tile_${BS}_${LR}_${MAX_STEPS}_dropout${PROMPT_DROPOUT}_v31"
OUTPUT_DIR="exp/controlnet_tile/${trail_name}"
MODEL_NAME="lambdalabs/miniSD-diffusers"

accelerate launch --main_process_port 12346 train_controlnet.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --output_dir=$OUTPUT_DIR \
    --condition_resolution=64 \
    --resolution=256 \
    --learning_rate=${LR} \
    --max_train_steps=${MAX_STEPS} \
    --max_train_samples=85000 \
    --dataloader_num_workers=8 \
    --train_shards_path_or_url="/mnt/train_data/laion_6.5plus_tars/laion_{000000..000100..100}.tar" \
    --validation_image \
    "conditioning_image_1.png" \
    "conditioning_image_2.jpeg" \
    --validation_prompt \
    "a dog sitting on the grass"  \
    "home office" \
    --validation_steps=500 \
    --checkpointing_steps=1000 --checkpoints_total_limit=10 \
    --train_batch_size=${BS} \
    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
    --gradient_accumulation_steps=1 \
    --use_8bit_adam \
    --resume_from_checkpoint=latest \
    --mixed_precision="fp16" \
    --tracker_project_name="controlnet_sd15_tile_v3" \
    --tracker_project_trail_name ${trail_name} \
    --proportion_empty_prompts ${PROMPT_DROPOUT} \
    --report_to=wandb

zjysteven commented 2 weeks ago

I see. That's a bit weird then. Why did you set --max_train_samples=85000 though (meaning that you only trained on 85K samples)?

Alan-Han commented 2 weeks ago

max_train_samples

Sorry, that was a typo. I revised this value in later experiments; it was originally set to 500,000. Did you also use LAION's text during the training process? Have you tried leaving the text empty? It seems like the use of controlnet tile doesn't require text guidance.

zjysteven commented 2 weeks ago

Did you also use LAION's text during the training process? Have you tried leaving the text empty? It seems like the use of controlnet tile doesn't require text guidance.

No I didn't try leaving the text empty and was using LAION's text. It is true though that after training when I change the prompt for the same input image, the output doesn't seem to change much.

Alan-Han commented 2 weeks ago

Did you also use LAION's text during the training process? Have you tried leaving the text empty? It seems like the use of controlnet tile doesn't require text guidance.

No I didn't try leaving the text empty and was using LAION's text. It is true though that after training when I change the prompt for the same input image, the output doesn't seem to change much.

Thanks for your help. Please let me know if anyone else meet the same issue

zjysteven commented 2 weeks ago

Will do. As of now I haven't heard anyone else reporting the same issue