mks0601 / V2V-PoseNet_RELEASE

Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
https://arxiv.org/abs/1711.07399
MIT License
377 stars 69 forks source link

The Detail of mAP #27

Closed SLIiii closed 5 years ago

SLIiii commented 5 years ago

Hi.From the model,the Net output 'xyzOutput' which represents the predicted coordinate of key points.And I extract the corresponding 'joint_real' from ITOP dataset.I'm coding for evaluate this data but encountered some question. In section 8.2 the paper discribed that: "For 3D human pose estimation, we used mean average precision (mAP) that is defined as the detected ratio of all human body joints based on 10 cm rule following" Is that mean count the ratio of error between X_predict and X_real;Y_predict and Y_real;Z_predict and Z_real less than 10cm respectively? Or the ratio of Euclidean distance between Predict_XYZ and Key_XYZ less than 10cm? Or it have another way to calculate,I used two of them but can not be close to the result in this paper.

mks0601 commented 5 years ago

Distance = L2Dist(pred, gt); % (# of frame, jointNum) shape AP = Mean(Distance < 10 cm) along the (# of frame) axis; % (jointNum) shape mAP = Mean(AP); % Scalar

Note that distThr has to be 0.1 because it is 'cm' not 'mm'.

SLIiii commented 5 years ago

Thanks,I will try it:)

SLIiii commented 5 years ago

Sorry, I'm still wondering about the AP.Can you give some more details? For example,I have processed '# frame' ,and have the "pred %(15,3)shape", and "gt %(15,3)shape" In order to calculate mAP of Head, I extracted the coordinate of 'Head'(First Line of pred and gt ) and calculate the "L2Dist(pred[Head],gt[Head])",and generate a new shape "Heads %(# frame,1)shape". Now I get a Matrix which contain the 'Head' distance between pred and gt for every frame. What should I do after this step?How to calculate the "Mean(Distance < 10 cm) along the (# of frame) axis"?

SLIiii commented 5 years ago

Hi,I rebuild your experience but can not evaluate metric. For my evaluate, I count the"frame num of distance<10cm / # of frame", and this result is about 7%-15%. So it must be some wrong. I'm still can not undetstand your descibe for AP, what the "Mean(Distance<10cm)" is meaning for? Is the"L2dist" means Euclidean distance absolutely? Can you give some more details and thanks for your answer sincerely. :)

mks0601 commented 5 years ago

Just mean the array along the # of frame axis. L2dist is euclidean distance.

mks0601 commented 5 years ago

You have to visualize it first to check whether the estimated values are correct.

SLIiii commented 5 years ago

emmm, is this "mean" meaning for
1:"count the frame which distance<10cm along the # of frame axis / # of frame" or 2:"collect the distance which number <10cm and then average them" or 3:"express"

In my exprerience "pred=xyzOutput in line 55 of test.lua" "gt=the joint_real in ITOP data" After check the "function extract_coord_from_output(output,xyzOutput)" in line 73 of util.lua,I think "pred" and "gt" is in same space——the world coordinate system,is that right?

The euclidean distance between pred and gt mainly range from 0 to 0.3, so I think maybe I have done a right experiment. However, the ratio of distance<0.1 is just 7%-15%. My research is at a critical stage and urgent to verify these data. If there are some details I have omitted, thank you for your definitely correcting.

mks0601 commented 5 years ago

the AP is not averaged distance. It is kind of ratio of successfull estimation.

For each joint, count # of frames < 10cm. Divide it by total # of frame. Average it for all joint.

SLIiii commented 5 years ago

Thank you for your patience. I have rebuilt it again.