Open MonkeeMan1 opened 9 months ago
is this what you're looking for? https://github.com/huggingface/diffusers/discussions/6991#discussioncomment-8491149 for questions like this, can we use discussion in the future? https://github.com/huggingface/diffusers/discussions
Hey, thank you very much for the reply. I apologise that this is the wrong place to put this question, in the future I will definetly make it in the discussions section.
Unfortunately this isn't quite what I'm looking for. The image previews with this solution are still very noisy. A previous solution I had with sd 1.5 looked like the image attached below:
This would be ideal as it is really amazing to see how the images improve with no noise in them.
Heya, @MonkeeMan1 I think I have an example of what you're asking about here https://gist.github.com/CoffeeVampir3/610e4627042ac8f36b45da6ec3af776f This notebook is a bit old so may not run, but should serve as an example of how to do the thing.
Basically there's one extra step where you decode the latents at each step, this can be kind of slow so this example uses TAESDXL vae decoder https://github.com/madebyollin/taesd
Heya, @MonkeeMan1 I think I have an example of what you're asking about here https://gist.github.com/CoffeeVampir3/610e4627042ac8f36b45da6ec3af776f This notebook is a bit old so may not run, but should serve as an example of how to do the thing.
Basically there's one extra step where you decode the latents at each step, this can be kind of slow so this example uses TAESDXL vae decoder https://github.com/madebyollin/taesd
Hey, thank you very much for the response. Taesdxl definetly looks like this is what im looking for. However, the implementation you sent doesn't quite seem to do it for me. I may just be making a mistake, so apologies if thats the case. I've took out the relevant stuff (I think) and this should work as far as I understand it.
The output from this code can be seen below;
import io
from diffusers import DiffusionPipeline, LMSDiscreteScheduler, AutoencoderTiny
import numpy as np
import torch
from PIL import Image
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16,
).to("cuda")
scheduler = LMSDiscreteScheduler(
beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
TINY_AUTOENCODER = AutoencoderTiny.from_pretrained(
"madebyollin/taesd", torch_dtype=torch.float16)
TINY_AUTOENCODER.to("cuda")
prompt = "A capybara holding a sword whilst wearing a knights costuem,"
def to_png_image(img_np):
"""Convert a numpy array to PNG format image."""
img = Image.fromarray((img_np * 255).astype(np.uint8))
buf = io.BytesIO()
img.save(buf, format='png', compress_level=0)
return buf.getvalue()
def decode_tensors(pipe, step, timestep, callback_kwargs):
latents = callback_kwargs["latents"]
img = TINY_AUTOENCODER.decode(latents)
img_np = img[0].squeeze(0).permute(
1, 2, 0).cpu().detach().numpy().astype('float32')
img_np = np.clip((img_np + 1) / 2.0, 0, 1)
buf = to_png_image(img_np)
with open(f"./imgs/{step}.png", 'wb') as f:
f.write(buf)
return callback_kwargs
image = pipe(
height=1024,
width=1024,
prompt=prompt,
negative_prompt="",
guidance_scale=7.5,
num_inference_steps=20,
callback_on_step_end=decode_tensors,
callback_on_step_end_tensor_inputs=["latents"],
).images[0]
image.save("./imgs/final.png")
Output:
desired output should resemble a blury image something like this:
Hi, I am still looking for a solution to this problem if anybody could help :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello,
I'm currently trying to create image previews with SDXL. This works! However, the image output are very noisy. A very long time ago I found a solution to this for sd1.5 but unfortunately it has been lost to time.
How would I go about denoising these images so they are a little more coherent to a human viewer? I know the first couple of iterations are always going to be very noisy, but eventually it should be possible to convert this noise into a blurry image that a human could understand.