poor quality of one random image

maobenz commented 2 years ago

I just try a random image and i find the quality of inversion is very poor.

Do you have any ideas about it?

oneThousand1000 commented 2 years ago

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

luchaoqi commented 1 year ago

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

I am also having poor results when following the steps in the original eg3d repo (wild img following preprocessing steps with ffhq pretrained model). The arguments used are all defaults/same as the ones in README. Do you have any ideas to improve the results? Thanks!

test img:

results:

https://user-images.githubusercontent.com/46330265/213952911-8ca9a2a8-b15a-4a5b-9d72-6b12861fdda2.mp4

oneThousand1000 commented 1 year ago

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

I am also having poor results when following the steps in the original eg3d repo (wild img following preprocessing steps with ffhq pretrained model). The arguments used are all defaults/same as the ones in README. Do you have any ideas to improve the results? Thanks!

test img:

results:

img00000366_w_plus_pretrained.mp4

Hey, I think it is caused by the EG3D model itself and my simple inversion project. For the EG3D model, the performance on extreme posture is much worse than frontal, it is due to the imbalanced pose distribution in FFHQ. It is still a challenging problem. For this simple inversion project, the input image only provides information from a single view, and it is hard for PTI to generate full-view results. So I recommend you to use a better inversion method, e.g., https://github.com/jiaxinxie97/HFGI3D

oneThousand1000 commented 1 year ago

Actually, this repo is just a simple implementation of the projector mentioned in EG3D, not the best choice for projecting an image into EG3D's latent space :)

luchaoqi commented 1 year ago

Thanks! But from the official website, seems they are able to get pretty good results with single img + PTI tho.

https://user-images.githubusercontent.com/46330265/213961159-71308bd9-0ffc-4abc-b45c-8925cc5cb0d5.mp4

oneThousand1000 commented 1 year ago

Thanks! But from the official website, seems they are able to get pretty good results with single img + PTI tho.

inversion_compressed.mp4

The input image you use has ear occluded, the input images in the video contain more complete information. This repo cannot generate the regions that are occluded.

You can see the results I generated using my repo: https://github.com/NVlabs/eg3d/issues/28#issuecomment-1159512947, here is the input re-aligned image and the input camera parameters: 01457.zip 01457

luchaoqi commented 1 year ago

Weird that I am getting a slightly different camera matrix than yours after following pytorch_3d_recon:

mine:

            [
                0.9982488751411438,
                0.01629943959414959,
                -0.056863944977521896,
                0.14564249100599475,
                0.010219544172286987,
                -0.9943544864654541,
                -0.1056165024638176,
                0.2914214260210597,
                -0.05826440826058388,
                0.10485044121742249,
                -0.9927797317504883,
                2.6802727132270365,
                0.0,
                0.0,
                0.0,
                1.0,
                4.2647,
                0.0,
                0.5,
                0.0,
                4.2647,
                0.5,
                0.0,
                0.0,
                1.0
            ]

yours:

array([ 0.99852723,  0.01640092, -0.05171374,  0.13343237,  0.01112113,
       -0.9948467 , -0.10077892,  0.27816952, -0.05310011,  0.10005538,
       -0.99356395,  2.6823157 ,  0.        ,  0.        ,  0.        ,
        1.        ,  4.2647    ,  0.        ,  0.5       ,  0.        ,
        4.2647    ,  0.5       ,  0.        ,  0.        ,  1.        ])

oneThousand1000 commented 1 year ago

Weird that I am getting a slightly different camera matrix than yours after following pytorch_3d_recon:

mine:

            [
                0.9982488751411438,
                0.01629943959414959,
                -0.056863944977521896,
                0.14564249100599475,
                0.010219544172286987,
                -0.9943544864654541,
                -0.1056165024638176,
                0.2914214260210597,
                -0.05826440826058388,
                0.10485044121742249,
                -0.9927797317504883,
                2.6802727132270365,
                0.0,
                0.0,
                0.0,
                1.0,
                4.2647,
                0.0,
                0.5,
                0.0,
                4.2647,
                0.5,
                0.0,
                0.0,
                1.0
            ]

yours:

array([ 0.99852723,  0.01640092, -0.05171374,  0.13343237,  0.01112113,
       -0.9948467 , -0.10077892,  0.27816952, -0.05310011,  0.10005538,
       -0.99356395,  2.6823157 ,  0.        ,  0.        ,  0.        ,
        1.        ,  4.2647    ,  0.        ,  0.5       ,  0.        ,
        4.2647    ,  0.5       ,  0.        ,  0.        ,  1.        ])

Yes, the matrix I uploaded is directly obtained from the dataset.json (in https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/runme.py), and the image is from the ffhq dataset. It is ok to use a slightly different matrix.

luchaoqi commented 1 year ago

Hello, I have a follow-up question regarding your implementation here. You might have also noticed the problem here that feeding optimized latent code directly into the generator without mapping network i.e. no camera information included.

I also tried directly using optimized ws but find that it may contain some artifact regarding shalini's example:

https://user-images.githubusercontent.com/46330265/225710735-7ca05898-b49b-41c3-bb67-c463c8eb5265.mp4

There are some artifacts existing out there:

example

I went through the issue post in the original eg3d repo but didn't find any useful information. Any takeaway conclusion without including camera information in the ws from your experiments so far?

pfeducode commented 1 year ago

谢谢！但是从官方网站来看，他们似乎能够通过单个 img + PTI 获得相当不错的结果。 inversion_compressed.mp4

您使用的输入图像有耳朵遮挡，视频中的输入图像包含更完整的信息。此 repo 无法生成被遮挡的区域。

你可以看到我使用我的 repo 生成的结果：NVlabs/eg3d#28 (comment)，这里是输入重新对齐的图像和输入相机参数： 01457.zip

Why does the regenerated image look different from the original image. It should be noted that this image is not from the ffhq dataset

regenerated image

400

source image

00001

oneThousand1000 commented 1 year ago

谢谢！但是从官方网站来看，他们似乎能够通过单个 img + PTI 获得相当不错的结果。 inversion_compressed.mp4

您使用的输入图像有耳朵遮挡，视频中的输入图像包含更完整的信息。此 repo 无法生成被遮挡的区域。你可以看到我使用我的 repo 生成的结果：NVlabs/eg3d#28 (comment)，这里是输入重新对齐的图像和输入相机参数： 01457.zip

Why does the regenerated image look different from the original image. It should be noted that this image is not from the ffhq dataset

regenerated image

source image

What do you mean by saying "regenerated image looks different from the original image"? If you mean that the regenerated image can not catch some fine-level details in the original image, it is caused by the expression power of the adversarial generative network. If you want to preserve the details, you can try to use https://github.com/jiaxinxie97/HFGI3D, which can achieve better performance than my simple projector implementation.

oneThousand1000 / EG3D-projector