huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.2k stars 5.4k forks source link

v_prediction in text_to_image weird images #7012

Closed BurgerAndreas closed 7 months ago

BurgerAndreas commented 9 months ago

Describe the bug

Using --prediction_type="v_prediction" with the example text_to_image_lora.py script leads to very weird images: image image

With --prediction_type="epsilon" (default) the images turn out great: image

Reproduction

git clone https://github.com/huggingface/diffusers.git
cd diffusers/examples/text_to_image

Same command as in provided text_to_image_lora example, but removed --mixed_precision="fp16" and added --prediction_type="v_prediction":

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch train_text_to_image_lora.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_name=$DATASET_NAME --caption_column="text" \
    --resolution=512 --random_flip \
    --train_batch_size=1 \
    --num_train_epochs=100 --checkpointing_steps=5000 \
    --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
    --seed=42 \
    --output_dir="sd-pokemon-model-lora" \
    --validation_prompt="dragon" --report_to="wandb" \
    --prediction_type="v_prediction" 

Other settings that do not work with --prediction_type="v_prediction"

Logs

No response

System Info

Who can help?

@sayakpaul @patrickvonplaten

sayakpaul commented 9 months ago

I don't think it's an issue candidate. It's better off in the "discussions".

You're trying to fine-tune a model that wasn't trained v-prediction so, adapting it with that prediction objective would require a bit of experimentation, I imagine.

Ccing @patil-suraj for further advice.

BurgerAndreas commented 8 months ago

I did not know - is changing the prediction objective known to create these issues?

Would love to hear your thoughts @patil-suraj and @sayakpaul

sayakpaul commented 8 months ago

If your model was trained using say, the Karras-style objective, then if you try to fine-tune it using "epsilon prediction" it might have repercussions because for the former the timesteps are continuous while for the latter, the timesteps are usually discrete.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul commented 7 months ago

Closing due to inactivity.