Closed chenbinghui1 closed 1 week ago
V100 Tensor Cores don't support bfloat16. Try casting to torch.float16 and try again. (note: I just made a PR to fix FP16 inference, you may need to install my diffusers fork if it's not merged)
@latentCall145 Thanks for your PR, the inference time can be reduced to nearly 90s, and the image quality seems right. BTW when using the same prompt and seed, the output image is different compared to bfloat16.
Precision would change the result, I would assume.
for some models it changes it much more than others. sometimes for the better!
Precision would change the result, I would assume.
It's a dynamic range issue, not a precision issue. It's discussed in my PR but I'm also stating it here in case they haven't seen it
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I believe this has been addressed, yes? If not, please feel free to re-open
I run FLUX1.-dev on V100-32G GPU card, the inference code is like: ` pipe = FluxPipeline.from_pretrained("checkpoints/FLUX.1-dev", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True) pipe.enable_model_cpu_offload()
image = pipe( prompt, height=1024, width=1024, guidance_scale=3.5, output_type="np", num_inference_steps=50, max_sequence_length=512, generator=generator ).images `
the inference time costs nearly 7minutes:
is this normal? Does anyone meet this ?