karfly / learnable-triangulation-pytorch

This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!
MIT License
1.09k stars 181 forks source link

Have some questions #135

Closed pervin0527 closed 3 years ago

pervin0527 commented 3 years ago

Hi. i saw about human36m_batch_128.pkl.

but i don't understand some points.

  1. when i read pkl file, ['images'] shape are (128, 4, 384, 384, 3). what is mean about 4?? num of cameras??

  2. ['detections'] shapes are (128, 4, 5). thoese bbox coordinates in second?? or third?? if in second, what does mean about 5?

  3. the last key of dict, ['indexes']. what is this mean?? files index number?? Why is this necessary?

code:

import pickle
import numpy as np

with open('./learnable-triangulation-pytorch/data/human36m_batch_128.pkl', 'rb') as f:
    data = pickle.load(f)
# print(type(data))
key_list = list(data.keys())
# print(key_list)
# ['images', 'detections', 'cameras', 'keypoints_3d', 'indexes']

images = np.array(data['images'])
print("IMAGE_SHAPE : ", images.shape)

detections = np.array(data['detections'])
print("BOUNDING_BOX : ", detections.shape)

cameras = np.array(data['cameras'])
print("CAMERA_CALIBRATION : ", cameras.shape)

keypoints_3d = np.array(data['keypoints_3d'])
print("JOINTS_COORDINATES : ", keypoints_3d.shape)

indexes = np.array(data['indexes'])
print(indexes.shape)

outputs:

IMAGE_SHAPE :  (128, 4, 384, 384, 3)
BOUNDING_BOX :  (128, 4, 5)
CAMERA_CALIBRATION :  (4, 128)
JOINTS_COORDINATES :  (128, 17, 4)
(128,)
shrubb commented 3 years ago

Hi,

4?? num of cameras??

Yes.

or third??

Yes. 5 is four coordinates plus detection confidence (like in any object detector; range is 0.0 to 1.0; you likely won't need it).

['indexes']. what is this mean?? files index number??

Index of sample in dataset (e.g. Human36MMultiViewDataset). Like so. I don't remember why we needed it, probably just for debugging or visualization, or maybe for proper alignment with ground truth.

pervin0527 commented 3 years ago

@shrubb

I really appreciate to answering the question. thank you.

Because I couldn't get the human 3.6 data set, so currently working on training your model on a custom dataset I have.