the result in the test phase （main_nyu_posereg_embedding）

moberweger / deep-prior

Fast and accurate 3D hand pose estimation from single depth images

GNU General Public License v3.0

63 stars 23 forks source link

the result in the test phase （main_nyu_posereg_embedding） #21

Closed sinewy333 closed 4 years ago

sinewy333 commented 5 years ago

I was using the main_nyu_posereg_embedding code in tensorflow, The output is: Training epoch 100, batch_num 1135, Minibatch Loss= 1.9201 Testing ... Mean error: 483.2774520085452mm, max error: 682.6990653506009mm Testing baseline Mean error: 33.98014831542969mm Mean error: 615.936767578125mm I've tried to change all of the parameters, but I can't reduce the error value.This problem has been bothering me for a long time.So I hope you can tell me,Is the result correct? thank you very much!

moberweger commented 5 years ago

Hi, I am not sure what you changed in order to run the code in tensorflow, since it is originally written for theano. But as far as I can see from the information you mentioned, there are at least two problems:

the network loss looks like it did not converge properly. Usually it is around 10e-3, but for you it is 1.9.
there is a problem with the data, since testing the baseline gives you 600mm error on the second testset, which is too large. I suggest validating and debugging the data that you feed to the network.

sinewy333 commented 5 years ago

is the data used for training is "depth_1" in NYU dataset. ? or "synthdepth"?when I used the "depth_1",I found that the code didn't detect the position of the hand very well.when I used the "synthdepth"，the code detected the position of the hand well.Thank you very much.

moberweger commented 5 years ago

Yes, that is correct. The data used for training is the "depth". The " synthdepth" is simply a rendering of a 3d hand model. Thus it does not work well for real camera data.

sinewy333 commented 5 years ago

The code of hand detector is based on gtorig [13]?which is from the data of label.But there is no such data in practice.How do we cut out the pictures with only hands?Thank you very much.

moberweger commented 5 years ago

If I understand your question correctly, we use gtorig[13] for training the localizer. During testing, we first use the center of mass for hand detection and the trained localizer for refining this location.

sinewy333 commented 5 years ago

is used the main_nyu_com_refine for for refining the location of the hand？then use the main_nyu_posereg_embedding for training the location of the joints，finally use the ORRef for refine the location of the joints？

moberweger commented 5 years ago

Yes, main_nyu_com_refine is for refining the location, and main_nyu_posereg_embedding for pose prediction. ORRef is not published in this repository, but you can optionally add it yourself.

sinewy333 commented 5 years ago

I understand. Thanks a lot！

sinewy333 commented 5 years ago

The test results of the model I trained came out. The test results in test 2 were much worse than those in test 1.Is it because the two data people are different?Does the size of the palm of different people affect the accuracy of the test? the result of your report is frome test1 or test2? look forward to your favourable reply. Thank you!

sinewy333 commented 5 years ago

distance(mm) | train | test1 | test2

10 | 25.84% | 15.82% | 0.12% 20 | 74.57% | 52.01% | 11.44% 30 | 88.89% | 73.44% | 32.71% 40 | 94.54% | 82.38% | 48.61% 50 | 97.43% | 89.18% | 64.52% 60 | 98.85% | 93.81% | 76.74% 70 | 99.43% | 97.01% | 85.31% 80 | 99.76% | 98.44% | 90.31%

This is the result of my test（the fraction of frames Within distance(max)）

moberweger commented 5 years ago

You are correct that the results for test2 are worse. This is due to the fact that test2 is a different user with different hand size than the training user. Therefore, it is encouraged to adjust the crop size of the hand cube accordingly. The evaluation in the report is from the joint set of test1+test2.