xubaixinxbx / 3dheads

[ICCV 2023] Code for "Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings"
Other
39 stars 2 forks source link

Zero Weights During Phase 1 Training #8

Open HisStar opened 3 months ago

HisStar commented 3 months ago

Hello,

I've been experimenting with your code for this project, and I've encountered an issue I'd like to discuss.

During the first phase of training, I noticed that the network's weights remain at zero throughout the process. This behavior is puzzling, and I'm not certain why it's occurring. I've attached a screenshot that shows the weights remaining at zero.

3071711081639_ pic

Interestingly, when I use the same FaceSpace dataset with NeuFace for training, the weights seem to be normal, indicating that the data itself is not the problem.

Could you provide some insights into this? How could we modify the network or training process to address these zero weights? Is this an expected behavior, or have I perhaps misunderstood something?

Looking forward to your advice.

Thank you,

xubaixinxbx commented 3 months ago

Hi:) just posted the training config for the facescape dataset. plz don't forget to change the path. also, the resolution of images in Facescape is not always constant in the entire dataset, so each step should have a resolution update like this,

    def init_hw_for_facescape(self, idx):
        h = self.train_dataset.h[idx]
        w = self.train_dataset.w[idx]
        self.img_res = [h, w]
        # print(f'render image res{self.img_res}')
        self.total_pixels = self.img_res[0] * self.img_res[1]
        self.train_dataset.img_res = self.img_res
        self.train_dataset.total_pixels = self.total_pixels

If successful, you would obtain the following results, image

The weight consistently being zero suggests that the ray_sampler may encounter difficulty in capturing the geometry of the object. It is advisable to examine the distribution of sampling points.

Since our method is built upon VolSDF, it is recommended to run VolSDF with multi-view images for a single person to verify that the scene bounding and ray sampler are functioning correctly. This can help ensure accurate capture of the object's geometry.

HisStar commented 3 months ago

@xubaixinxbx Thank you for your response and the valuable information provided. I've looked through the training configuration for the Facescape dataset and also implemented the method for updating the resolution for each step. This has certainly improved my understanding of the training process.

However, I have another question related to a specific file mentioned in the code (please see the attached screenshot). I wasn't able to locate this file in the repository. Could you clarify if this file is supposed to be provided, or if it is generated during the process?

Your guidance is much appreciated!

IMG_5881

xubaixinxbx commented 3 months ago

Thank you for patiently pointing out this typo. The dataset name has been correctly renamed to facescape_dataset in the config.

HisStar commented 3 months ago

Thank you for your previous responses! I have a question regarding the high value of 'points' in the code, which seems to prevent the normal generation. I've checked my faceSpaceData and it runs fine on NeuFace, so I believe the data isn't the issue. I've also debugged the code for data loading, and both the mask and sampling points appear to be correct. I've attached a screenshot for your reference. Could you please help me understand why this issue might be occurring?

1

2

xubaixinxbx commented 3 months ago

Hi,

Normal maps do not contain values for geometry, indicating that the reconstruction may not be accurate. You can visualize the points and the reconstructed face model in NeuFace to verify if these points are located near the surface. Additionally, running VolSDF for a single face can help determine if a normal map is correctly generated since the sampler in our method is based on VolSDF.

HisStar commented 2 months ago

Hello @xubaixinxbx ,

First of all, I would like to extend my sincerest apologies if this question has been addressed elsewhere; I have searched through the existing issues but couldn't find a solution to my specific problem.

Recently, I have been conducting multiple tests using NeuFace and VolSDF with their respective datasets, namely FaceScape for NeuFace and DTU for VolSDF. Both of these tests were successful. However, when attempting to apply the techniques to 3DHeads, I've encountered a persistent issue where I am unable to generate the normal files necessary for proceeding with the reconstruction.

I've tried numerous methods to troubleshoot this issue on my own, yet, unfortunately, I haven't been able to resolve it. Given this, I was hoping to kindly request your assistance. Would it be possible for you to share a snippet of training code and dataset format specifically tailored for an individual from the FaceScape dataset? This would greatly help me understand the correct setup and potentially resolve the issue I'm facing.

Your work on "Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings" has been incredibly inspiring, and I am eager to apply your methodologies correctly. I would greatly appreciate any guidance or resources you could share to help me overcome this hurdle.

Thank you very much for your time and consideration!

xubaixinxbx commented 2 months ago

Hi, sorry for the late response. For the Facescape dataset, you can replace this camera loading function in rend_util.py.

def get_camera_params(uv, pose, intrinsics):
    cam_loc = pose[:, :3, 3]
    p = pose

    batch_size, num_samples, _ = uv.shape
    intrinsics = intrinsics.cuda()
    fx = intrinsics[:, 0, 0]
    fy = intrinsics[:, 1, 1]
    cx = intrinsics[:, 0, 2]
    cy = intrinsics[:, 1, 2]
    sk = intrinsics[:, 0, 1]

    x_cam = uv[:, :, 0].view(batch_size, -1)
    y_cam = uv[:, :, 1].view(batch_size, -1)
    directions = \
        torch.stack([(x_cam - cx.unsqueeze(-1)) / fx.unsqueeze(-1), -(y_cam - cy.unsqueeze(-1)) / fx.unsqueeze(-1), -torch.ones_like(x_cam)], -1)  # (H, W, 3)

    rays_d = torch.bmm(directions, p[:, :3, :3].transpose(1,2))  # (H, W, 3)
    ray_dirs = F.normalize(rays_d, dim=2)
    return ray_dirs, cam_loc

Best,