karfly / learnable-triangulation-pytorch

This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!
MIT License
1.09k stars 181 forks source link

Predict pose using my images #120

Open agenthong opened 3 years ago

agenthong commented 3 years ago

Hi, @karfly Thanks for sharing this great repo. I've trained the model using human3.6 dataset. After that, I use 2D heatmaps of other images to unproject with my own carlibration and feed to the trained model. But I find result like this: image Is this the 3D pose? And I think maybe this result is in a different coordinate system. If yes, how can I get the corresponding poses in my images' coordinate system?

karfly commented 3 years ago

Hey, @agenthong! Yes, this tensor consists of 3D locations of joints. You can project it back to image using calibration matrix of your camera.

agenthong commented 3 years ago

Hey, @agenthong! Yes, this tensor consists of 3D locations of joints. You can project it back to image using calibration matrix of your camera.

Thanks for replying. Do you have this step in the code? Maybe you can point out where it is.

karfly commented 3 years ago

You can find example here.

chaisheng-dawnlight commented 3 years ago

Hi @agenthong @karfly , I also use my own data to predict 3D pose like this:

屏幕快照 2020-12-17 上午10 52 02

my question is how to visualize the result like it:

屏幕快照 2020-12-17 上午10 55 28

I generate the result as follows:

屏幕快照 2020-12-17 上午10 59 46

Whether the predicted result needs further post-processing?

agenthong commented 3 years ago

You can find example here.

Yeah, but I want to get 3D joints, it projects the tensor to 2D images.

karfly commented 3 years ago

@agenthong To convert 3D points to your camera coordinate system, you need to apply rotation (R) and translation (t) to these 3D points.

karfly commented 3 years ago

@chaisheng-dawnlight Try to visualize these 3D points without any processing with plt scatter 3D function (https://www.geeksforgeeks.org/3d-scatter-plotting-in-python-using-matplotlib/)

agenthong commented 3 years ago

@agenthong To convert 3D points to your camera coordinate system, you need to apply rotation (R) and translation (t) to these 3D points.

Thanks a lot! So it means that this tensor is the 3D joints in world coordinate system?

chaisheng-dawnlight commented 3 years ago

@chaisheng-dawnlight Try to visualize these 3D points without any processing with plt scatter 3D function (https://www.geeksforgeeks.org/3d-scatter-plotting-in-python-using-matplotlib/)

@karfly Hi, Thanks for your reply. I use the scatter function to draw the 3D pose, but it's still not work. This is my visualize code:

屏幕快照 2020-12-18 下午7 36 49
karfly commented 3 years ago

@agenthong Actually in the coordinates of the 1st camera. It usually equals to world coordinates.

karfly commented 3 years ago

@chaisheng-dawnlight What plot do you get from this code?

agenthong commented 3 years ago

@agenthong Actually in the coordinates of the 1st camera. It usually equals to world coordinates.

I have GT 3D keypoints like this: image It's quite different from my result: image To sum up, what's the format of output? And how can I compare my prediction with GT in same type?

roselidev commented 3 years ago

Hi, I'm also experiencing similar problem. I followed the comment like @karfly said.

@chaisheng-dawnlight Try to visualize these 3D points without any processing with plt scatter 3D function (https://www.geeksforgeeks.org/3d-scatter-plotting-in-python-using-matplotlib/)

The Ground Truth points looks like this, I used draw_3d_pose in this repository :

And the pretrained model prediction looks like this :

I've plotted the 3d points like you said, and it looks like this :

I currently have no idea what error made this result..

I used 4 views of single pose, with corresponding camera parameter. Here's the code How I got the predicted 3d points.

# MODEL LOAD
config = load_config('./model/pretrained/learnable_triangulation_volumetric/human36m_vol_softmax.yaml')
model = VolumetricTriangulationNet(config)

#OUTPUT#
#Loading pretrained weights from: ./model/pretrained/resnet/pose_resnet_4.5_pixels_human36m.pth
#Reiniting final layer filters: module.final_layer.weight
#Reiniting final layer biases: module.final_layer.bias
#Successfully loaded pretrained weights for backbone

# PROCESSING MODEL INPUT
annotations_path = ['./data/anno/16-1_001-C01_3D.json', './data/anno/16-1_001-C02_3D.json', './data/anno/16-1_001-C03_3D.json', './data/anno/16-1_001-C04_3D.json']
device = torch.device('cpu')
batch_keypoints_3d = []
cameras = []
for path in annotations_path:
    _, _, _, keypoints_3d, camera = process_annotation_json(path)
    batch_keypoints_3d.append(keypoints_3d)
    cameras.append(camera)

batch = {'cameras' : cameras, 'pred_keypoints_3d' : batch_keypoints_3d}
images_batch = []
images_batch = process_images_batch(np.array(images_data))
proj_matricies_batch = torch.stack([torch.stack([torch.from_numpy(cam.projection) for cam in c])for c in cameras])
proj_matricies_batch = proj_matricies_batch.float().to(device)

# FORWARD MODEL
keypoints_3d_pred, heatmaps_pred, volumes_pred, confidences_pred, cuboids_pred, coord_volumes_pred, base_points_pred = model(images_batch, proj_matricies_batch, batch)
roselidev commented 3 years ago

I found that I didn't load the pretrained weights, so I added the code like this : before :

# MODEL LOAD
config = load_config('./model/pretrained/learnable_triangulation_volumetric/human36m_vol_softmax.yaml')
model = VolumetricTriangulationNet(config)

after :

# MODEL LOAD
config = load_config('./model/pretrained/learnable_triangulation_volumetric/human36m_vol_softmax.yaml')
model = VolumetricTriangulationNet(config)
if config.model.init_weights:
    state_dict = torch.load(config.model.checkpoint)
    for key in list(state_dict.keys()):
        new_key = key.replace("module.", "")
        state_dict[new_key] = state_dict.pop(key)
    model.load_state_dict(state_dict, strict=True)
    print("Successfully loaded pretrained weights for whole model")

I've used the code in train.py.

And the result is still quite the same as before..

What I suspect is that the keypoints_3d_pred has different coordinate scale with the human36m gt data. Hope I could get any help on how I should process my 3d points ground truth.