huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.32k stars 5.42k forks source link

instruct_pix2pix problem #7665

Closed mechigonft closed 6 months ago

mechigonft commented 7 months ago

Describe the bug

I'm using the instruct_pix2pix training method to regenerate backgrounds for cut-out food images. However, I've noticed that the generated backgrounds often contain numerous fragmented and distorted cups, plates, and bowls. What could be the reason for this? I've examined my training data, and although it also includes cups, plates, and bowls, there is only one of each, and all are in their normal shape. Could you help me look into this issue? cut-out food image: result after regenerating the background: result_extend_background my training data example: input_image: 1 edited_image: 1

Reproduction

training script:

export MODEL_NAME="/models/stable-diffusion-v1-5" export DATASET_ID="" export OUTPUT_DIR=""

accelerate launch --mixed_precision="fp16" /ossfs/workspace/diffusers/examples/instruct_pix2pix/train_instruct_pix2pix.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$DATASET_ID \ --enable_xformers_memory_efficient_attention \ --resolution=256 --random_flip \ --train_batch_size=1 --gradient_accumulation_steps=1 --gradient_checkpointing \ --max_train_steps=5000 \ --checkpointing_steps=10000 --checkpoints_total_limit=1 \ --learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ --conditioning_dropout_prob=0.05 \ --mixed_precision=fp16 \ --seed=42 \ --output_dir=$OUTPUT_DIR

inference script:

import PIL import requests import torch from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = '' # <- replace this pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") generator = torch.Generator("cuda").manual_seed(0)

image_path = '/ossfs/workspace/result.png' def download_image(image_path): image = PIL.Image.open(image_path) image = PIL.ImageOps.exif_transpose(image) image = image.convert("RGB") return image

image = download_image(image_path)

prompt = 'replace the background with a clean and concise background, simple and clean'

prompt = 'replace the background picture to pure white background'

prompt = 'extend background' num_inference_steps = 20 image_guidance_scale = 1.5 guidance_scale = 10

edited_image = pipe(prompt,

ng_prompt = 'other food and drinks, white empty cups, white empty bowls, white empty plates, cutlery, knives and forks, chopsticks, complex background',

ng_prompt = 'cups, bowls, plates', image=image, num_inference_steps=num_inference_steps, image_guidance_scale=image_guidance_scale, guidance_scale=guidance_scale, generator=generator, ).images[0] edited_image.save("/ossfs/workspace/result_extend_background.png")

Logs

No response

System Info

$diffusers-cli env Setting ds_accelerator to cuda (auto detect)

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

No response

mechigonft commented 7 months ago

The bold prompt and ng_prompt are actually the code that should be commented. # is added in front of the code, which becomes bold on the github platform.

mechigonft commented 7 months ago

I hope to get your valuable suggestions for scenarios like this, background replacement, background generation, and background expansion.

DN6 commented 7 months ago

@mechigonft It would be better to ask this question in the Discussions section.

mechigonft commented 7 months ago

@DN6 OK, thx

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 6 months ago

closing this! feel free to open a question in discussion section