mit-han-lab / anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
https://hanlab.mit.edu/projects/anycost-gan/
MIT License
776 stars 97 forks source link

I want to embedding 256x256 image and generate 256x256 image test. #8

Closed youngjae-git closed 3 years ago

youngjae-git commented 3 years ago

Hi, @tonylins Thank you for your good paper.

In the case of this Github, only 256x256 resolution can be encoded. However, it seems that only the resolutions of 1024x1024 and 512x512 are uploaded through the decoder.

What I want to test is to encode and decode a 256x256 image and check whether the same image as the original image comes out.

Can you send me 256x256 anycost-ffhq decoder weight?

tonylins commented 3 years ago

Hi Youngjae, I'm not fully clear about your question. The anycost generator can generate images of resolutions 128/256/512/1024. You can use the intermediate output for resolution 256.

To project an image into the latent space, we downsample the target image to resolution 256 when computing LPIPS following common practice. You can project a higher resolution image (e.g., 1024).

youngjae-git commented 3 years ago

Hi, @tonylins Thank you your fast replay. image

(Left image is input FFHQ image, right image is generated image from latent of encoder) image

What I'm trying to do is check out the "projection". However, after resizing the 1024x1024 FFHQ image to 256x256 and projecting it, a completely different person came out. Please tell me if there is anything I missed. The code below is the code I used.

`

Decoder

from models.anycost_gan import Generator import torch from torchvision import models from utils.torch_utils import safe_load_state_dict_from_url

URLTEMPLATE = 'https://hanlab.mit.edu/projects/anycost-gan/files/{}{}.pt' g_model = 'generator' g_config = 'anycost-ffhq-config-f' g_url = URL_TEMPLATE.format(g_model, g_config)

resolution = 1024 channel_multiplier = 2 key = 'g_ema'

g_model = Generator(resolution, channel_multiplier=channel_multiplier) model_dir = '/root/.cache/torch/hub/checkpoints' sd = torch.hub.load_state_dict_from_url(g_url) g_model.load_state_dict(sd['g_ema']) g_model.eval()

Encoder

from models.encoder import ResNet50Encoder e_model = 'encoder' e_config = 'anycost-ffhq-config-f' e_url = URL_TEMPLATE.format(e_model, e_config)

n_style = 18 style_dim = 512 key = 'state_dict'

e_model = ResNet50Encoder(n_style=n_style, style_dim=style_dim) sd = torch.hub.load_state_dict_from_url(e_url) e_model.load_state_dict(sd['state_dict']) e_model.eval()

Convert Input image to Latent

img_1024 = Image.open('/nas/data/ffhq/images1024x1024/00000.png') img_256 = img_1024.resize((256,256)) trans_img = transforms.ToTensor()(img_256) trans_img = trans_img.view(1,3,256,256) latent = e_model(trans_img) latent.shape # (1,18,512)

generator

img_out_np = get_4x4_grid(g_model, latent) plt.figure(figsize = (8,8)) plt.imshow(img_out_np) plt.axis('off')

`

tonylins commented 3 years ago

Hi Youngjae, to use the encoder, you need to normalize the image into range [-1, 1] using transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]). The current image range is [0, 1] in your code.

tonylins commented 3 years ago

Hi Youngjae, I will close the issue due to inactivity. Feel free to reopen if the problem is not solved.