spurra / vae-hands-3d

Code to evaluate model of paper "Cross-modal Deep Variational Hand Pose Estimation"
https://ait.ethz.ch/projects/2018/vae_hands/
GNU General Public License v3.0
121 stars 23 forks source link

How to test on real images #11

Closed momo1986 closed 5 years ago

momo1986 commented 5 years ago

Hello, If I want to test on real RGB image, is this way correct?

    img_crop = cv.imread(filename)
    img_crop = cv.cvtColor(img_crop, cv.COLOR_BGR2RGB)
    # Get VAE prediction
    img_res = cv.resize(img_crop, end_size)
    img_pyt = img_res.transpose(2, 0, 1).reshape((1, 3, end_size[0], end_size[1]))
    img_crop_var = Variable(np2pyt(img_pyt), volatile=True).cuda()
    if hand_side_invariance:
        hand_side_var = Variable(np2pyt(hand_side), volatile=True).cuda()
    else:
        hand_side_var = None
    #Then go for prediction.

Is there any other parameters should I provide supplementally?

Thanks & Regards!

momo1986 commented 5 years ago

In Chapter 4.7, You write:

Our model is guided to learn a manifold of hand poses. In this section, we demonstrate the smoothness and consistency of it. To this end, we perform a walk on one dimension of the latent space by embedding two RGB images of separate hand poses into the latent space and obtain two corresponding samples z1 and z2. We then decode the latent space samples that reside on the interpolation line between them using our models for RGB and 3D joint decoding. Fig. 6 shows the resulting reconstructions, demonstrating consistency between both decoders. The fingers move in synchrony and the generated synthetic samples are both physically plausible and consistent across modalities. This demonstrates that the learned latent space is indeed smooth and represents a valid statistical model of hand poses.

I am not sure whether this means it should provide two RGB images rather than one for 3D estimation?

Thanks & Regards!

liuq99 commented 5 years ago

Hello, Did you test on real RGB images successfully?

spurra commented 5 years ago

Hi momo1986, please excuse my delayed response. Yes, this looks like it is the correct way to predict on real images. You do not need to supply two RGB images, one is enough. Let me know if you have anything else which is not clear.