Closed PromeAIpro closed 3 weeks ago
Can inference normally by using autocast_ctx
, but a black picture appears
import torch
from diffusers.utils import load_image
from diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline
from diffusers.models.controlnet_flux import FluxControlNetModel
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'promeai/FLUX.1-controlnet-lineart-promeai'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.float32)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")
autocast_ctx = torch.autocast('cuda')
control_image = load_image("https://huggingface.co/promeai/FLUX.1-controlnet-lineart-promeai/resolve/main/images/example-control.jpg")
prompt = "cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere"
with autocast_ctx:
image = pipe(
prompt,
control_image=control_image,
controlnet_conditioning_scale=0.6,
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("./image.jpg")
This is because t5 does not support autocast_ctx and will output nan.
I think this becomes only valid when training and running intermediate validation, right?
yes, happens when running intermediate validation
in other training script we use autocast_ctx
, but I'm dont know why t5 outputs NaN, maybe we should fix this.
Cc: @ArthurZucker from the transformers
team. Have you seen this issue i.e., inference with T5 under autocast is unstable?
On the other hand, if it's a training only thing (that too for intermediate validation), I think we should try to handle it from the training script, instead. Could we try that?
Is there a way for diffusers to clone controlnet? we consider cloning a copy and converting it to bf16 for validation, or just load pipeline using fp32 (the memory cost a lot) @sayakpaul
can work with pre calculate the text emb and do autocast.
Yep, training with T5 under autocast / different precision is hard, see this https://github.com/huggingface/transformers/issues/20287, lots of linked issues about the training and post training that's hard with t5 in particular
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I guess this issue is now resolved?
Describe the bug
An error occurs when loading controlnet as fp32 and loading mainpipe as bf16
Reproduction
Logs
System Info
Who can help?
@sayakpaul @yiyixuxu