The problem of camera parameters

hongsukchoi / Pose2Mesh_RELEASE

Official Pytorch implementation of "Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose", ECCV 2020

MIT License

677 stars 69 forks source link

The problem of camera parameters #5

Open moondabaojian opened 3 years ago

moondabaojian commented 3 years ago

Hello, thank you for your excellent work！ I am a beginner of 3D reconstruction，There are some doubts when reading your code

class OptimzeCamLayer(nn.Module):
    def __init__(self):
        super(OptimzeCamLayer, self).__init__()

        self.img_res = 500 / 2
        self.cam_param = nn.Parameter(torch.rand((1,3)))

    def forward(self, pose3d):
        output = pose3d[:, :, :2] + self.cam_param[None, :, 1:]
        output = output * self.cam_param[None, :, :1] * self.img_res + self.img_res
        return output

def get_model():
    model = OptimzeCamLayer()

    return model

Why are camera parameters random and how can they work correctly？ Hope to get your answer, thank you

hongsukchoi commented 3 years ago

Hi @moondabaojian,

First, Pose2Mesh does not use that module (OptimizeCamLayer). Pose2Mesh is fully supervised by 3D groundtruth and there is no 2D loss.

Second, I tried to use the module for the visualization purpose, but I noticed that it tend to downgrade the performance. So I just removed it. Now, the visualization (mesh overlay on images) is done by iterative fitting as you can see in the demo codes.

Last, the module (OptimizeCamLayer) has learning parameters and the randomness is just an initialization.

moondabaojian commented 3 years ago

Thank you for your quick reply！I've got it In addition, I have a puzzle about camera parameters，When I try to reconstruct 3D human body based on graph convolution（like GraphCMR）,I found that some dataset networks can learn the correct camera parameters, but some can't. Especially scale in camera parameters, It makes me confused, Can you give me some advice? Thank you!

hongsukchoi commented 3 years ago

Hmm... Could you clarify what 'dataset networks' indicate?

Anyway, I think learning camera parameters by graph convolution is not a good idea. Graph convolution exploits the topology of a mesh or a skeleton but the camera parameters are nothing to do with the topology. Simple fully connected layers will be enough.

moondabaojian commented 3 years ago

Thank you for your reply There are some mistakes in my statement，I mean, when learning camera parameters, my network works on some datasets and some don't In addition, what kind of data do you think should be input into the full connection layer, to learn camera parameters

hongsukchoi commented 3 years ago

Hi @moondabaojian

If the scale or the location of human in the cropped image is inconsistent, the network may fail. In general, most of the 3D pose estimation methods learn 'pseudo' camera parameters. Regardless of whether they are based on weak-perspective or pin-hole projection, the target camera parameters are just fitting an estimated 3D pose to a 2D pose from the cropped image (ex. 224x224). In other words, I think the network is trying to memorize(?) or learn some kind of 3D pose and 2D pose pairs and thus similar 3D poses should have similar 2D poses in the cropped image.
The data can be either 2D pose or image feature. For 2D pose case you can check RepNet. For image feature case, you can check HMR or SPIN.

moondabaojian commented 3 years ago

Thank you for your reply. Your suggestion is very helpful to me The following is a screenshot of the training process 2020-12-31 10-11-21屏幕截图 Can this be caused by improper data preprocessing？（Although sc has been increasing, it is very slow）

hongsukchoi commented 3 years ago

Hi @moondabaojian,

Happy New Year!

I am not sure, but that kind of result can occur even with the proper data preprocessing.

I observed similar phenomenon when 2D loss kept increasing while 3D loss kept decreasing.

The average 3D test error (mm) was reasonable, but the scale of projection tend to appear weird when the target pose was difficult as in your example.

moondabaojian commented 3 years ago

Happy New Year! Thank you for your answer I'm taking your 'loss_edge_length' added to my program,but training error Break in ' U, S, V = torch.svd(A[i])' 'RuntimeError: Lapack Error gesdd : 2 superdiagonals failed to converge' There is no error in calculating 'loss_edge_length', only when it is added to total loss, an error will be reported

hongsukchoi commented 3 years ago

Hi @moondabaojian,

Using the edge loss at the beginning of the training could lead to extreme local optima, and actually I used the loss after several epochs for that reason.

I recommend you also to add the edge loss after sufficient training.