Open moondabaojian opened 3 years ago
Hi @moondabaojian,
First, Pose2Mesh does not use that module (OptimizeCamLayer). Pose2Mesh is fully supervised by 3D groundtruth and there is no 2D loss.
Second, I tried to use the module for the visualization purpose, but I noticed that it tend to downgrade the performance. So I just removed it. Now, the visualization (mesh overlay on images) is done by iterative fitting as you can see in the demo codes.
Last, the module (OptimizeCamLayer) has learning parameters and the randomness is just an initialization.
Thank you for your quick reply!I've got it In addition, I have a puzzle about camera parameters,When I try to reconstruct 3D human body based on graph convolution(like GraphCMR),I found that some dataset networks can learn the correct camera parameters, but some can't. Especially scale in camera parameters, It makes me confused, Can you give me some advice? Thank you!
Hmm... Could you clarify what 'dataset networks' indicate?
Anyway, I think learning camera parameters by graph convolution is not a good idea. Graph convolution exploits the topology of a mesh or a skeleton but the camera parameters are nothing to do with the topology. Simple fully connected layers will be enough.
Thank you for your reply There are some mistakes in my statement,I mean, when learning camera parameters, my network works on some datasets and some don't In addition, what kind of data do you think should be input into the full connection layer, to learn camera parameters
Hi @moondabaojian
If the scale or the location of human in the cropped image is inconsistent, the network may fail. In general, most of the 3D pose estimation methods learn 'pseudo' camera parameters. Regardless of whether they are based on weak-perspective or pin-hole projection, the target camera parameters are just fitting an estimated 3D pose to a 2D pose from the cropped image (ex. 224x224). In other words, I think the network is trying to memorize(?) or learn some kind of 3D pose and 2D pose pairs and thus similar 3D poses should have similar 2D poses in the cropped image.
The data can be either 2D pose or image feature. For 2D pose case you can check RepNet. For image feature case, you can check HMR or SPIN.
Thank you for your reply. Your suggestion is very helpful to me The following is a screenshot of the training process Can this be caused by improper data preprocessing?(Although sc has been increasing, it is very slow)
Hi @moondabaojian,
Happy New Year!
I am not sure, but that kind of result can occur even with the proper data preprocessing.
I observed similar phenomenon when 2D loss kept increasing while 3D loss kept decreasing.
The average 3D test error (mm) was reasonable, but the scale of projection tend to appear weird when the target pose was difficult as in your example.
Happy New Year! Thank you for your answer I'm taking your 'loss_edge_length' added to my program,but training error Break in ' U, S, V = torch.svd(A[i])' 'RuntimeError: Lapack Error gesdd : 2 superdiagonals failed to converge' There is no error in calculating 'loss_edge_length', only when it is added to total loss, an error will be reported
Hi @moondabaojian,
Using the edge loss at the beginning of the training could lead to extreme local optima, and actually I used the loss after several epochs for that reason.
I recommend you also to add the edge loss after sufficient training.
Hello, thank you for your excellent work! I am a beginner of 3D reconstruction,There are some doubts when reading your code
Why are camera parameters random and how can they work correctly? Hope to get your answer, thank you