Open songwoh opened 6 months ago
+1 I have the same question about how to use the encoder.
@songwoh did you make the full pipeline work?
I can't figure out why the transparency decoder is outputting nonsense.
hi all, we will release a few more models and layerdiffuse’s img2img next week.
and also issue https://github.com/layerdiffusion/sd-forge-layerdiffuse/issues/87 https://github.com/layerdiffusion/sd-forge-layerdiffuse/issues/84
Edit April 27:
hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.
hi all, we will release a few more models and layerdiffuse’s img2img next week.
and also issue #87 #84
Edit April 27:
hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.
@layerdiffusion Hi author, the release of img2img and encoder patcher has been delayed numerous times for quite a while. Considering your busy schedule, could you provide a specific release date that won't be postponed? I've encountered issues like #96 in my own experimental replication of the encoder, and I'm eager to receive an official resolution as soon as possible.Thanks!
hi all, we will release a few more models and layerdiffuse’s img2img next week.
and also issue #87 #84
Edit April 27:
hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.
Hello, I'm trying to reproduce the encoder but have yet to achieve the desired outcome. Here's my code. Could you please help identify any errors or missteps in my implementation?
import numpy as np
import matplotlib.pyplot as plt
from diffusers import AutoencoderKL
from PIL import Image, ImageFilter
from lib_layerdiffusion.models import TransparentVAEDecoder, TransparentVAEEncoder
from lib_layerdiffusion.utils import (
get_torch_device,
load_torch_file,
)
device = get_torch_device()
vae_transparent_decoder = TransparentVAEDecoder(
load_torch_file("models/layer_sd15_vae_transparent_decoder.safetensors")
)
vae_transparent_encoder = TransparentVAEEncoder(
load_torch_file("models/layer_sd15_vae_transparent_encoder.safetensors")
).model
sd_vae = AutoencoderKL.from_single_file(
"./models/vae-ft-mse-840000-ema-pruned.safetensors"
).to(device)
def encode_img(input_img, input_mask):
# Single image -> single latent in a batch (so size 1, 4, 64, 64)
if len(input_img.shape) < 4:
input_img = input_img.unsqueeze(0)
input_mask = input_mask.unsqueeze(0)
with torch.no_grad():
latent = sd_vae.encode((input_img * 2 - 1) * input_mask) # Note scaling
transparent_image = torch.cat(
[input_img * 2 - 1, input_mask.unsqueeze(0)], dim=1
).half()
adjusted_latent = vae_transparent_encoder(transparent_image)
return latent.latent_dist.sample(), adjusted_latent
print("Models loaded")
gaussian_filter = ImageFilter.GaussianBlur(13)
# Read PNG image with alpha channel
image = Image.open("test.png")
image = image.resize((512, 512))
# split rgb channel
image_rgb = image.convert("RGB")
blurred_image = image_rgb.filter(gaussian_filter)
for i in range(127):
blurred_image.paste(image_rgb, (0, 0), image)
blurred_image = blurred_image.filter(gaussian_filter)
image = np.asarray(image)
blurred_image = np.asarray(blurred_image)[..., :3]
# normalize
image = image / 255.0
blurred_image = blurred_image / 255.0
image_color = image[..., :3]
image_alpha = image[..., 3]
image_color[image_alpha == 0] = blurred_image[image_alpha == 0]
image_color = (
torch.as_tensor(image_color, dtype=torch.float32).permute(2, 0, 1).to(device)
)
image_alpha = torch.as_tensor(image_alpha, dtype=torch.float32).to(device)
latent, adjusted_latent = encode_img(image_color, image_alpha)
wrapper = vae_transparent_decoder.decode_wrapper()
vis_list, png_list = wrapper(sd_vae.decode, latent + adjusted_latent)
out = png_list[0]
# save image
image_out = Image.fromarray(out)
image_out.save("out.png")
Here's the decoder_wrapper
I implemented in TransparentVAEDecoder
:
@torch.no_grad()
def wrapper(func, latent):
pixel = (
func(latent).sample.to(device=self.load_device, dtype=self.dtype).half()
)
latent = latent.to(device=self.load_device, dtype=self.dtype).half()
self.model = self.model.to(self.load_device)
vis_list = []
png_list = []
for i in range(int(latent.shape[0])):
if self.mod_number != 1 and i % self.mod_number != 0:
vis_list.append(pixel[i : i + 1].movedim(1, -1))
continue
y = self.estimate_augmented(pixel[i : i + 1], latent[i : i + 1])
y = y.clip(0, 1).movedim(1, -1)
alpha = y[..., :1]
fg = y[..., 1:]
B, H, W, C = fg.shape
cb = checkerboard(shape=(H // 64, W // 64))
cb = cv2.resize(cb, (W, H), interpolation=cv2.INTER_NEAREST)
cb = (0.5 + (cb - 0.5) * 0.1)[None, ..., None]
cb = torch.from_numpy(cb).to(fg)
vis = fg * alpha + cb * (1 - alpha)
vis_list.append(vis)
png = torch.cat([fg, alpha], dim=3)[0]
png = (
(png * 255.0)
.detach()
.cpu()
.float()
.numpy()
.clip(0, 255)
.astype(np.uint8)
)
# p.extra_result_images.append(png)
png_list.append(png)
vis_list = torch.cat(vis_list, dim=0)
return vis_list, png_list
return wrapper```
vae_transparent_encoder
Hello, have you figured out yet?
Hey people,
The implementation of image encoding is planned here weeks ago but it was delayed because webui’s img2img codebase is a bit difficult to add logics (and we had some other workloads).
Given the demands and all previous delays, we decide to move to a pure diffusers codebase and release image encoding part there in the next week.
Before that happens, some faster info here: The LatentTransparencyOffsetEncoder
’s input follows same format with TransparentVAEDecoder
’s UNet
’s output. In other words, the format is 4 channnels: the first channel is alpha, in range [0, 1], and the second to forth are R, G, B, all in range [0, 1]. The RGB needs to be “padded RGB” with all invisible pixels padded/filled with smooth continuous colors. Here is an example:
RGB | padded RGB |
---|---|
and the offsets will be added to latents. Also the latent offset is not very intensive in the released version so it should only influence image when the denoise strength is relatively low (like < 0.25)
The padding/filtering method using ImageFilter.GaussianBlur
written by OedoSoldier looks good and functional, but have some differences to the pretrained padding (which is not in this repo yet). We recommend to use the official padding method after we make it avaliable in that new diffusers codebase.
hey people, we are moving to
https://github.com/lllyasviel/LayerDiffuse_DiffusersCLI
(this repo has the I2I mentioned in this issue)
First of all, thanks for your great work. I was very impressed with your approach for dealing with transparency channel.
As I was running some experiments with the released model, I ran into issue with latent offset vae encoder. Specifically, I ran some tests with SD15 latent offset vae encoder, and it seems like adding original vae latents with the offset encoder output is producing very blurry results.
I am aware that you are working on this part, but was wondering if you could explain what other measure needs to be done for the encoder to work properly.
Thank you.