Latent Offset VAE Encoder

songwoh commented 6 months ago

First of all, thanks for your great work. I was very impressed with your approach for dealing with transparency channel.

As I was running some experiments with the released model, I ran into issue with latent offset vae encoder. Specifically, I ran some tests with SD15 latent offset vae encoder, and it seems like adding original vae latents with the offset encoder output is producing very blurry results.

I am aware that you are working on this part, but was wondering if you could explain what other measure needs to be done for the encoder to work properly.

Thank you.

OedoSoldier commented 6 months ago

+1 I have the same question about how to use the encoder.

danieltudosiu commented 6 months ago

@songwoh did you make the full pipeline work?

I can't figure out why the transparency decoder is outputting nonsense.

layerdiffusion commented 6 months ago

hi all, we will release a few more models and layerdiffuse’s img2img next week.

and also issue https://github.com/layerdiffusion/sd-forge-layerdiffuse/issues/87 https://github.com/layerdiffusion/sd-forge-layerdiffuse/issues/84

Edit April 27:

hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.

AzureRossi commented 5 months ago

hi all, we will release a few more models and layerdiffuse’s img2img next week.

and also issue #87 #84

Edit April 27:

hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.

@layerdiffusion Hi author, the release of img2img and encoder patcher has been delayed numerous times for quite a while. Considering your busy schedule, could you provide a specific release date that won't be postponed? I've encountered issues like #96 in my own experimental replication of the encoder, and I'm eager to receive an official resolution as soon as possible.Thanks!

OedoSoldier commented 4 months ago

hi all, we will release a few more models and layerdiffuse’s img2img next week.

and also issue #87 #84

Edit April 27:

hi all, we had some other workloads last week but we are still processing it so the updates of project layer diffuse will be delayed to about next week.

Hello, I'm trying to reproduce the encoder but have yet to achieve the desired outcome. Here's my code. Could you please help identify any errors or missteps in my implementation?

import numpy as np
import matplotlib.pyplot as plt
from diffusers import AutoencoderKL
from PIL import Image, ImageFilter
from lib_layerdiffusion.models import TransparentVAEDecoder, TransparentVAEEncoder
from lib_layerdiffusion.utils import (
    get_torch_device,
    load_torch_file,
)

device = get_torch_device()

vae_transparent_decoder = TransparentVAEDecoder(
    load_torch_file("models/layer_sd15_vae_transparent_decoder.safetensors")
)

vae_transparent_encoder = TransparentVAEEncoder(
    load_torch_file("models/layer_sd15_vae_transparent_encoder.safetensors")
).model

sd_vae = AutoencoderKL.from_single_file(
    "./models/vae-ft-mse-840000-ema-pruned.safetensors"
).to(device)

def encode_img(input_img, input_mask):
    # Single image -> single latent in a batch (so size 1, 4, 64, 64)
    if len(input_img.shape) < 4:
        input_img = input_img.unsqueeze(0)
        input_mask = input_mask.unsqueeze(0)
    with torch.no_grad():
        latent = sd_vae.encode((input_img * 2 - 1) * input_mask)  # Note scaling
        transparent_image = torch.cat(
            [input_img * 2 - 1, input_mask.unsqueeze(0)], dim=1
        ).half()
        adjusted_latent = vae_transparent_encoder(transparent_image)
    return latent.latent_dist.sample(), adjusted_latent

print("Models loaded")

gaussian_filter = ImageFilter.GaussianBlur(13)
# Read PNG image with alpha channel

image = Image.open("test.png")
image = image.resize((512, 512))

# split rgb channel
image_rgb = image.convert("RGB")

blurred_image = image_rgb.filter(gaussian_filter)
for i in range(127):
    blurred_image.paste(image_rgb, (0, 0), image)
    blurred_image = blurred_image.filter(gaussian_filter)

image = np.asarray(image)
blurred_image = np.asarray(blurred_image)[..., :3]
# normalize
image = image / 255.0
blurred_image = blurred_image / 255.0

image_color = image[..., :3]
image_alpha = image[..., 3]

image_color[image_alpha == 0] = blurred_image[image_alpha == 0]

image_color = (
    torch.as_tensor(image_color, dtype=torch.float32).permute(2, 0, 1).to(device)
)
image_alpha = torch.as_tensor(image_alpha, dtype=torch.float32).to(device)

latent, adjusted_latent = encode_img(image_color, image_alpha)

wrapper = vae_transparent_decoder.decode_wrapper()
vis_list, png_list = wrapper(sd_vae.decode, latent + adjusted_latent)
out = png_list[0]

# save image
image_out = Image.fromarray(out)
image_out.save("out.png")

Here's the decoder_wrapper I implemented in TransparentVAEDecoder:


        @torch.no_grad()
        def wrapper(func, latent):
            pixel = (
                func(latent).sample.to(device=self.load_device, dtype=self.dtype).half()
            )

            latent = latent.to(device=self.load_device, dtype=self.dtype).half()
            self.model = self.model.to(self.load_device)
            vis_list = []
            png_list = []

            for i in range(int(latent.shape[0])):
                if self.mod_number != 1 and i % self.mod_number != 0:
                    vis_list.append(pixel[i : i + 1].movedim(1, -1))
                    continue

                y = self.estimate_augmented(pixel[i : i + 1], latent[i : i + 1])

                y = y.clip(0, 1).movedim(1, -1)
                alpha = y[..., :1]
                fg = y[..., 1:]

                B, H, W, C = fg.shape
                cb = checkerboard(shape=(H // 64, W // 64))
                cb = cv2.resize(cb, (W, H), interpolation=cv2.INTER_NEAREST)
                cb = (0.5 + (cb - 0.5) * 0.1)[None, ..., None]
                cb = torch.from_numpy(cb).to(fg)

                vis = fg * alpha + cb * (1 - alpha)
                vis_list.append(vis)

                png = torch.cat([fg, alpha], dim=3)[0]
                png = (
                    (png * 255.0)
                    .detach()
                    .cpu()
                    .float()
                    .numpy()
                    .clip(0, 255)
                    .astype(np.uint8)
                )
                # p.extra_result_images.append(png)
                png_list.append(png)

            vis_list = torch.cat(vis_list, dim=0)
            return vis_list, png_list

        return wrapper```

fkcptlst commented 4 months ago

vae_transparent_encoder

Hello, have you figured out yet?

layerdiffusion commented 4 months ago

Hey people,

The implementation of image encoding is planned here weeks ago but it was delayed because webui’s img2img codebase is a bit difficult to add logics (and we had some other workloads).

Given the demands and all previous delays, we decide to move to a pure diffusers codebase and release image encoding part there in the next week.

Before that happens, some faster info here: The LatentTransparencyOffsetEncoder’s input follows same format with TransparentVAEDecoder’s UNet’s output. In other words, the format is 4 channnels: the first channel is alpha, in range [0, 1], and the second to forth are R, G, B, all in range [0, 1]. The RGB needs to be “padded RGB” with all invisible pixels padded/filled with smooth continuous colors. Here is an example:

RGB	padded RGB

and the offsets will be added to latents. Also the latent offset is not very intensive in the released version so it should only influence image when the denoise strength is relatively low (like < 0.25)

The padding/filtering method using ImageFilter.GaussianBlur written by OedoSoldier looks good and functional, but have some differences to the pretrained padding (which is not in this repo yet). We recommend to use the official padding method after we make it avaliable in that new diffusers codebase.

lllyasviel commented 4 months ago

hey people, we are moving to

https://github.com/lllyasviel/LayerDiffuse_DiffusersCLI

(this repo has the I2I mentioned in this issue)

lllyasviel / sd-forge-layerdiffuse

Latent Offset VAE Encoder #90