lmb-freiburg / hand3d

Network estimating 3D Handpose from single color images
GNU General Public License v2.0
802 stars 252 forks source link

A question about training on STB #35

Open wlcosta opened 4 years ago

wlcosta commented 4 years ago

Hello! I'm trying to achieve the same results that you describe in your paper on the posenet stage when adding the STB dataset. However, the results are far from what you have achieved, and I can not find the reason why. I was hoping if you could enlighten me on this step.

After training with the RHD dataset using the pipeline you've published on posenet_training.py, I load BinaryDbReaderSTB with the following parameters:

dataset = BinaryDbReaderSTB(mode='training', batch_size=train_para['BATCH_SIZE'], shuffle=True, coord_uv_noise=True, hand_crop=True, crop_center_noise=True, use_wrist_coord=True)

And proceed to run the session passing the tensors:

_, loss_v = sess.run([train_op, loss])

The BinaryDbReaderSTB class was not modified and I've processed the data using the scripts you provided.

I then proceed to evaluate the training, using:

dataset = BinaryDbReaderSTB(mode='evaluation', shuffle=False, use_wrist_coord=True)

When executing with USE_RETRAINED=False, the metrics are as expected: Average mean EPE: 18.581 pixels However, when using my model trained with RHD+STB, the lowest mean EPE I got was ~40 pixels. Could you please point me to what I am forgetting?

I tried some ideas, as using different epochs combinations, tweaking the lr decay and different configurations on the data loader, but no effect.

Thank you for your attention