Improving Fast And Accurate 3D Hand Pose Estimation
Pretrained ICVL and NYU accuracy #42

mahdinobar commented 4 years ago

When I evaluate the pre-trained models for the ICVL and NYU datasets using and, I get the mean errors 6.41 mm and 4.1 mm whereas at your paper you have mean errors 8.1 and 12.3 mm, respectively. I was wondering whether the estimator has been improved or how it could happen?

PS I use the code as follows:

rng = np.random.RandomState(23455)

print("create data")
aug_modes = ['com', 'rot', 'none']  # 'sc',

comref = None  # "./eval/ICVL_COM_AUGMENT/net_ICVL_COM_AUGMENT.pkl"
docom = False

di = ICVLImporter('../data/ICVL/', refineNet=comref, cacheDir='/home/mahdi/HVR/git_repos/deep-prior-pp/src/cache')

Seq2 = di.loadSequence('test_seq_2', docom=docom)
testSeqs = [Seq2]

testDataSet = ICVLDataset(testSeqs)
test_data, test_gt3D = testDataSet.imgStackDepthOnly('test_seq_2')

print("create network")
batchSize = 64

poseNetParams = ResNetParams(type=4, nChan=1, wIn=128, hIn=128, batchSize=64, numJoints=16, nDims=3)
poseNetParams.loadFile = "./eval/{}/{}_network_prior.pkl".format(eval_prefix, eval_prefix)
poseNet = ResNet(rng, cfgParams=poseNetParams)

#  test
print("Testing ...")
gt3D = [j.gt3Dorig for j in testSeqs[0].data]
jts_embed = poseNet.computeOutput(test_data)
jts = jts_embed
joints = []
for i in xrange(test_data.shape[0]):
    joints.append(jts[i].reshape((-1, 3)) * (testSeqs[0].config['cube'][2] / 2.) + testSeqs[0].data[i].com)

joints = np.array(joints)

hpe = ICVLHandposeEvaluation(gt3D, joints)
hpe.subfolder += '/' + eval_prefix + '/'
print("Mean error: {}mm, max error: {}mm".format(hpe.getMeanError(), hpe.getMaxError()))
print("{}".format([hpe.getJointMeanError(j) for j in range(joints[0].shape[0])]))
print("{}".format([hpe.getJointMaxError(j) for j in range(joints[0].shape[0])]))

# save results
             open("./eval/{}/result_{}_{}.pkl".format(eval_prefix, os.path.split(__file__)[1], eval_prefix), "wb"),


moberweger commented 4 years ago

The pretrained model should give similar results compared to the ones reported in the paper. In your code you use docom=False which does not include the detection step and uses groundtruth crops. Please try docom=True and set the comref network, and check if the results are more similar to the reported results.