hongsukchoi / TCMR_RELEASE

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021
MIT License
291 stars 39 forks source link

About the processing of Human3.6M dataset #14

Open Ironbrotherstyle opened 3 years ago

Ironbrotherstyle commented 3 years ago

Hi, thank you for your great work. I have a question about processing Human3.6M dataset. The question occurs when I vis the joints_2d value of one frame. For example, in

img = cv2.imread(img_paths[0])
temp = draw_skeleton(img, j2ds[0], dataset='spin', unnormalize=False, thickness=2)
cv2.imshow('img', temp)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.waitKey(1)

oflib/data_utils/h36m_utils.py, I saw no point of human body. The calculated joints_img values are all much larger than the actual img size with 1002 * 1000. The same situation as the bbox coordinates. Please tell me if I made something wrong or there are some details I missed. Here I give the points of each stage as below shows,

img_path = ''/s_01_act_02_subact_01_ca_01/s_01_act_02_subact_01_ca_01_000001.jpg'' 
joint_world = np.array([-91.67900,154.40401,907.26099
39.87789,145.00247,923.98785
-188.47031,14.07711,475.16879
-261.84055,186.55286,61.43892
-223.23566,163.80551,890.53418
-11.67599,160.89919,484.39148
-51.55030,220.14624,35.83440
-132.34781,215.73018,1128.83960
-97.16740,202.34435,1383.14661
-112.97073,127.96946,1477.44568
-120.03289,190.96477,1573.40002
25.89546,192.35947,1296.15710
107.10581,116.05029,1040.50623
129.83810,-48.02492,850.94806
-230.36955,203.17923,1311.96387
-315.40536,164.55284,1049.17468
-350.77136,43.44213,831.34729])

joint_cam = [2010.42700,4087.25537,1292.84644
1886.65796,4075.91113,1245.64563
1928.91736,4507.91309,1333.64587
1977.67346,4957.13770,1379.72803
2134.19580,4098.59961,1340.04712
2031.51624,4481.37549,1537.77136
2157.32617,4915.08838,1489.10266
2078.00024,3878.57568,1212.85498
2046.96667,3628.18262,1163.56909
2033.97595,3521.32764,1219.12964
2068.22290,3438.07544,1147.56335
1928.06787,3718.17041,1139.55640
1816.48694,3959.70459,1223.14160
1724.87122,4117.47461,1396.59180
2167.39746,3691.38574,1229.23682
2222.94946,3938.15894,1346.70288
2201.05029,4128.45068,1510.03296]

joint_img = [2293.13818,4131.44580,1292.84644
2246.83618,4258.04932,1245.64563
2168.68262,4381.59473,1333.64587
2153.83154,4624.87061,1379.72803
2336.17871,4013.76196,1340.04712
2025.24133,3848.66016,1537.77136
2171.42334,4290.73535,1489.10266
2474.36963,4173.13672,1212.85498
2526.92822,4081.93237,1163.56909
2422.92334,3819.14282,1219.12964
2576.23364,3942.19556,1147.56335
2449.90332,4247.40625,1139.55640
2213.05371,4218.24072,1223.14160
1926.74341,3887.58179,1396.59180
2531.49927,3950.21460,1229.23682
2402.62939,3860.20703,1346.70288
2181.58545,3642.56445,1510.03296]

Thank you.

hongsukchoi commented 3 years ago

Hi @Ironbrotherstyle ,

sorry for the late reply. I somehow missed the notification.

I checked my codes and I didn't find any problem. In your data, the joint_cam value looks weird. Did you use our h36m camera annotation?

Ironbrotherstyle commented 3 years ago

How can I get your h36m camera annotation? Thank you very much.

hongsukchoi commented 3 years ago

Hi @Ironbrotherstyle ,

You can find our preprocessed annotations below

https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK

Ironbrotherstyle commented 2 years ago

Hi @Ironbrotherstyle ,

You can find our preprocessed annotations below

https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK

Sorry to bother you again.

I am sure that I have downloaded your data form the link: link. And I have unzipped the annotation.zip, smpl_param.zip. But only to find that I can not produce the result presented in h36m_train_25fps_nosmpl_db.pt which you gave. Take /s_01_act_02_subact_01_ca_01/s_01_act_02_subact_01_ca_01_000001.jpg as an example (mentioned in former questions). The joints3D stored in h36m_train_25fps_nosmpl_db.pt you provided for s_01_act_02_subact_01_ca_01_000001.jpg is

joints3D = 
[[ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.1468992   0.82783306  0.19625664]
 [ 0.02108921  0.39412037  0.2449255 ]
 [ 0.12376885  0.01134411  0.04720068]
 [-0.12376909 -0.01134416 -0.04720068]
 [-0.08150972  0.42065766  0.04079962]
 [-0.03275359  0.86988246  0.08688211]
 [ 0.19062322  0.04119563  0.21718645]
 [ 0.21252236 -0.14909631  0.05385637]
 [ 0.15697043 -0.39586952 -0.06360912]
 [-0.08235921 -0.36908498 -0.1532898 ]
 [-0.19394009 -0.12755072 -0.06970453]
 [-0.28555584  0.03021911  0.10374594]
 [ 0.03653957 -0.45907274 -0.12927723]
 [ 0.05779592 -0.6491798  -0.14528322]
 [ 0.          0.          0.        ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.06757315 -0.20867959 -0.07999134]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.02354887 -0.5659276  -0.07371664]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]
 [ 0.17673077  0.32104865 -5.2038817 ]]

The joints2D stored in h36m_train_25fps_nosmpl_db.pt for s_01_act_02_subact_01_ca_01_000001.jpg is

[[  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [506.216   622.7914    1.     ]
 [479.83392 530.7903    1.     ]
 [500.99265 447.99222   1.     ]
 [445.81503 441.72485   1.     ]
 [456.16092 537.1746    1.     ]
 [467.204   634.1008    1.     ]
 [515.4759  456.40582   1.     ]
 [520.3363  413.175     1.     ]
 [508.13968 355.92737   1.     ]
 [453.8017  359.16052   1.     ]
 [429.87268 415.51346   1.     ]
 [412.80936 452.7784    1.     ]
 [480.90833 339.61743   1.     ]
 [485.61975 296.0767    1.     ]
 [473.6541  444.88696   1.     ]
 [  0.        0.        0.     ]
 [488.14777 397.20282   1.     ]
 [  0.        0.        0.     ]
 [478.3514  317.69824   1.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]
 [  0.        0.        0.     ]]

All the joints2D and joints3D are reasonable. However, the result produced by your annotation.zip and lib/data_utils/h36m_utils.py is quite abnormal. Do you know how to solve the problem? Many thanks!

hongsukchoi commented 2 years ago

Nothing to be sorry. I appreciate your patience and interest.

The joint_world is correct above. So maybe the camera parameters are the problem.

Camera parameters for Subject 1 should be like below: {'1': {'R': [[-0.9153617321513369, 0.40180836633680234, 0.02574754463350265], [0.051548117060134555, 0.1803735689384521, -0.9822464900705729], [-0.399319034032262, -0.8977836111057917, -0.185819527201491]], 't': [-346.05078140028075, 546.9807793144001, 5474.481087434061], 'f': [1145.04940458804, 1143.78109572365], 'c': [512.541504956548, 515.4514869776]}, '2': {'R': [[0.9281683400814921, 0.3721538354721445, 0.002248380248018696], [0.08166409428175585, -0.1977722953267526, -0.976840363061605], [-0.3630902204349604, 0.9068559102440475, -0.21395758897485287]], 't': [251.42516271750836, 420.9422103702068, 5588.195881837821], 'f': [1149.67569986785, 1147.59161666764], 'c': [508.848621645943, 508.064917088557]}, '3': {'R': [[-0.9141549520542256, -0.40277802228118775, -0.045722952682337906], [-0.04562341383935874, 0.21430849526487267, -0.9756999400261069], [0.4027893093720077, -0.889854894701693, -0.214287280609606]], 't': [480.482559565337, 253.83237471361554, 5704.207679370455], 'f': [1149.14071676148, 1148.7989685676], 'c': [519.815837182153, 501.402658888552]}, '4': {'R': [[0.9141562410494211, -0.40060705854636447, 0.061905989962380774], [-0.05641000739510571, -0.2769531972942539, -0.9592261660183036], [0.40141783470104664, 0.8733904688919611, -0.2757767409202658]], 't': [51.88347637559197, 378.4208425426766, 4406.149140878431], 'f': [1145.51133842318, 1144.77392807652], 'c': [514.968197319863, 501.882018537695]}}

Are they the same with yours?

and did you use this function for world-to-camera transformation?

def world2cam(world_coord, R, t):
    cam_coord = np.dot(R, world_coord.transpose(1,0)).transpose(1,0) + t.reshape(1,3)
    return cam_coord
Ironbrotherstyle commented 2 years ago

ey the same with yours?

Thank you for your reply. Yes, them are the same, my camera paras are,

{'1': {'R': [[-0.9153617321513369, 0.40180836633680234, 0.02574754463350265], [0.051548117060134555, 0.1803735689384521, -0.9822464900705729], [-0.399319034032262, -0.8977836111057917, -0.185819527201491]], 't': [1841.10702774543, 4955.28462344526, 1563.4453958977], 'f': [1145.04940458804, 1143.78109572365], 'c': [512.541504956548, 515.4514869776]}, '2': {'R': [[0.9281683400814921, 0.3721538354721445, 0.002248380248018696], [0.08166409428175585, -0.1977722953267526, -0.976840363061605], [-0.3630902204349604, 0.9068559102440475, -0.21395758897485287]], 't': [1761.27853428116, -5078.00659454077, 1606.2649598335], 'f': [1149.67569986785, 1147.59161666764], 'c': [508.848621645943, 508.064917088557]}, '3': {'R': [[-0.9141549520542256, -0.40277802228118775, -0.045722952682337906], [-0.04562341383935874, 0.21430849526487267, -0.9756999400261069], [0.4027893093720077, -0.889854894701693, -0.214287280609606]], 't': [-1846.7776610084, 5215.04650469073, 1491.97246576518], 'f': [1149.14071676148, 1148.7989685676], 'c': [519.815837182153, 501.402658888552]}, '4': {'R': [[0.9141562410494211, -0.40060705854636447, 0.061905989962380774], [-0.05641000739510571, -0.2769531972942539, -0.9592261660183036], [0.40141783470104664, 0.8733904688919611, -0.2757767409202658]], 't': [-1794.78972871109, -3722.69891503676, 1574.89272604599], 'f': [1145.51133842318, 1144.77392807652], 'c': [514.968197319863, 501.882018537695]}}

and the process of obtaining joints_img are borrowed from your code,

  joints_world = np.array(joints_3d[str(int(action))][str(int(subaction))][str(index)])    # [-91.67900,154.40401,907.26099  ...
  joints_cam = world2cam(joints_world, R, t)                                                              # [2010.42700,4087.25537,1292.84644 ...
  joints_img = cam2pixel(joints_cam, f, c)                                                                   # [2293.13818,4131.44580,1292.84644 ...
  joints_valid = np.ones((h36m_joint_num, 1))

that is what confused me.

hongsukchoi commented 2 years ago

The translation parameters are different from the one I downloaded.

Ironbrotherstyle commented 2 years ago

The translation parameters are different from the one I downloaded.

So weird. I re-downloaded your data, and they look the same as yours. Thank you so much.

Wuchuq commented 2 years ago

Hi! I observed that in h36m pre-processing, all the code about loading smpl parameters are commented, so when should we use these?

hongsukchoi commented 2 years ago

You can use it. Just uncomment them:)

Mirandl commented 1 year ago

You can find our preprocessed annotations below

https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK

@hongsukchoi Hi, it seems that this link has been invalid. could u please share it again?

The reason why I wonder this preprocessed annotations is because there's something wrong with my 'annotations'. My 'images' structure is:

image

However, my annotations for Subject 1 are:

a

It seems that annotations doesn't include cam=2/3/4 but only with ca=1 and the dict number is 1 less than the images size. So I wonder is there any solution with it. Thank you very much.

hongsukchoi commented 1 year ago

Hi!

I think my former colleague changed something. Let me check.

And for the second question, which file are you using? Human36M has two evaluation protocols and one protocol only uses the frontal camera data (cam4).

Mirandl commented 1 year ago

Hi! Thank you for your checking. I think I need all four camera annotations for my processed data. Could you please provide preprocessed annotations with whole version? It would be a great help to me. Thanks.

Wuchuq commented 1 year ago

Hi! This link is invalid now, could you please share it again?

hongsukchoi commented 1 year ago

HI I updated the link

Check out here: https://github.com/hongsukchoi/Pose2Mesh_RELEASE#data

Dipankar1997161 commented 1 year ago

HI I updated the link

Check out here: https://github.com/hongsukchoi/Pose2Mesh_RELEASE#data

Hello @hongsukchoi,

Could you tell me, how did you generate the smpl parameters for H36m, did you do it by passing the videos/images to smplify-x with camera values? Or any other method(REPO)

I wish to generate the smpl parameters directly from the 3d groundtruth keypoints, if possible.

Since Human3.6m has 2 sets of 3d groundtruth keypoint CDF files
1. Original coordinate system -- Positions_3D cdf files
2. Transformed coordinate system (Camera Specific) -- Positions_3D_mono cdf files.

Which file should we consider using for smpl parameter generation?

I appreciate your response in advanced.

hongsukchoi commented 1 year ago

We used the camera coordinate values. Check out this repo: https://github.com/mks0601/NeuralAnnot_RELEASE/blob/main/Human3.6M/demo_smplx.py You will find what you want

Dipankar1997161 commented 1 year ago

@hongsukchoi, Thanks for your response. I saw from this issue, you performed Neural Body on Human3.6m :https://github.com/zju3dv/neuralbody/issues/27#issue-925883488

I am also planning to do the same, but I am unable to understand how to get the accurate smpl parameters in neuralbody format. I tried to use ROMP and VIBE, but the rendering was not accurate at all. Since they use weak_perspective camera, I could not render it. prog_005000

Could you tell me, how did you actually generate the smpl parameters for Neural Body, I have sorted out the segmentation mask and processed it precisely, but not been able to get the accurate smpl.

I hope to hear from you on this matter.

Thank you

hongsukchoi commented 1 year ago

Hi, refer to this code.

import torch.utils.data as data
from lib.utils import base_utils
from PIL import Image
import numpy as np
import json
import glob
import os
import imageio
import cv2
from lib.config import cfg
from lib.utils.if_nerf import if_nerf_data_utils as if_nerf_dutils
from lib.utils.feat_utils import *
from plyfile import PlyData
import os.path as osp
import random
import smplx
from pycocotools.coco import COCO
import torch

class Dataset(data.Dataset):
    def __init__(self, split):
        super(Dataset, self).__init__()
        self.root_path = osp.join('data', 'h36m')
        self.img_path = osp.join(self.root_path, 'images')
        self.mask_path = osp.join(self.root_path, 'masks')
        self.annot_path = osp.join(self.root_path, 'annotations')
        self.preprocessed_path = osp.join(self.root_path, 'preprocessed')

        self.split = split
        self.smpl_layer = smplx.create('./data', 'smpl')
        if self.split == 'train':
            subject_list = [1, 5, 6, 7, 8]
            sampling_ratio = 50
            input_cam_idxs = ['1', '2', '3', '4']
            render_cam_idxs = ['1', '2', '3', '4']
        else:
            subject_list = [9, 11]
            sampling_ratio = 500
            input_cam_idxs = ['1', '2', '3', '4']
            render_cam_idxs = ['1', '2', '3', '4']

        # aggregate annotations from each subject
        db = COCO()
        cameras = {}
        smpl_params = {}
        for subject in subject_list:
            # data load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_data.json'), 'r') as f:
                annot = json.load(f)
            if len(db.dataset) == 0:
                for k, v in annot.items():
                    db.dataset[k] = v
            else:
                for k, v in annot.items():
                    db.dataset[k] += v
            # camera load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_camera.json'), 'r') as f:
                cameras[str(subject)] = json.load(f)
            # smpl parameter load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_SMPL_NeuralAnnot.json'), 'r') as f:
                smpl_params[str(subject)] = json.load(f)
        db.createIndex()

        self.cam_info = {}
        self.datalist = {}
        self.data_idx = []
        for aid in db.anns.keys():
            ann = db.anns[aid]
            image_id = ann['image_id']
            img = db.loadImgs(image_id)[0]
            img_path = osp.join(self.img_path, img['file_name'])
            mask_path = osp.join(self.mask_path, img['file_name'][:-4] + '.png')
            img_shape = (img['height'], img['width'])

            if not osp.isfile(mask_path):
                continue

            # check subject and frame_idx
            frame_idx = img['frame_idx'];
            if frame_idx % sampling_ratio != 0:
                continue

            # check smpl parameter exist
            subject = img['subject'];
            action_idx = img['action_idx'];
            subaction_idx = img['subaction_idx'];
            frame_idx = img['frame_idx'];
            cam_idx = img['cam_idx'];

            if subject == 11 and action_idx == 2 and subaction_idx == 2:
                continue
            try:
                smpl_param = smpl_params[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]
            except KeyError:
                continue

            # camera parameter
            cam_param = cameras[str(subject)][str(cam_idx)]
            R, t, f, c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32).reshape(3, 1), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
            K = np.array([[f[0], 0, c[0]], [0, f[1], c[1]], [0, 0, 1]], dtype=np.float32).reshape(3, 3)

            # camera
            if str(subject) not in self.cam_info:
                self.cam_info[str(subject)] = {}
            if str(cam_idx) not in self.cam_info[str(subject)]:
                self.cam_info[str(subject)][str(cam_idx)] = {'R': R, 't': t, 'K': K}

            # path and smpl parameters
            if str(subject) not in self.datalist:
                self.datalist[str(subject)] = {}
            if str(action_idx) not in self.datalist[str(subject)]:
                self.datalist[str(subject)][str(action_idx)] = {}
            if str(subaction_idx) not in self.datalist[str(subject)][str(action_idx)]:
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)] = {}
            if str(frame_idx) not in self.datalist[str(subject)][str(action_idx)][str(subaction_idx)]:
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)] = {'img_path': {}, 'mask_path': {}, 'smpl_param': smpl_param}
                seq_name = f's_{subject:02d}_act_{action_idx:02d}_subact_{subaction_idx:02d}'
                filename = f'{frame_idx + 1:06d}'
                vertex_path = osp.join(self.preprocessed_path, 'vertices', seq_name, filename + '.npy')
                vertex_rgb_path = osp.join(self.preprocessed_path, 'vertices_rgb', seq_name, filename + '.npy')
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['vertices_path'] = vertex_path
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['vertices_rgb_path'] = vertex_rgb_path

            if str(cam_idx) not in self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['img_path']:
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['img_path'][str(cam_idx)] = img_path
                self.datalist[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)]['mask_path'][str(cam_idx)] = mask_path

            if self.split == 'train':
                if str(cam_idx) in input_cam_idxs:
                    valid_render_cam_idxs = []
                    for render_cam_idx in render_cam_idxs:
                        render_mask_path = mask_path.replace('ca_0' + str(cam_idx), 'ca_0' + str(render_cam_idx))
                        if osp.isfile(render_mask_path) and osp.getsize(render_mask_path) > 1500:
                            valid_render_cam_idxs.append(render_cam_idx)
                    if len(valid_render_cam_idxs) == 0:
                        continue
                    self.data_idx.append(
                        {'subject': str(subject), 'action_idx': str(action_idx), 'subaction_idx': str(subaction_idx), 'frame_idx': str(frame_idx), 'input_cam_idx': str(cam_idx), 'render_cam_idxs': valid_render_cam_idxs})
            else:
                if str(cam_idx) in input_cam_idxs:
                    for render_cam_idx in render_cam_idxs:
                        render_mask_path = mask_path.replace('ca_0' + str(cam_idx), 'ca_0' + str(render_cam_idx))
                        if not osp.isfile(render_mask_path) or osp.getsize(render_mask_path) < 1500:
                            continue
                        self.data_idx.append(
                            {'subject': str(subject), 'action_idx': str(action_idx), 'subaction_idx': str(subaction_idx), 'frame_idx': str(frame_idx), 'input_cam_idx': str(cam_idx), 'render_cam_idxs': [render_cam_idx]})

    def load_3d_data(self, smpl_param, subject, cam_idx):
        pose = torch.FloatTensor(smpl_param['pose']).float().view(1, -1)
        shape = torch.FloatTensor(smpl_param['shape']).float().view(1, -1)
        trans = torch.FloatTensor(smpl_param['trans']).float().view(1, -1)
        output = self.smpl_layer(global_orient=pose[:, :3], body_pose=pose[:, 3:], betas=shape, transl=trans)
        xyz = output.vertices[0].detach().numpy()

        # obtain the original bounds for point sampling
        min_xyz = np.min(xyz, axis=0)
        max_xyz = np.max(xyz, axis=0)
        min_xyz -= 0.05
        max_xyz += 0.05
        bounds_world = np.stack([min_xyz, max_xyz], axis=0)
        mesh = xyz
        joint = np.dot(self.smpl_layer.J_regressor, mesh)

        # transform smpl from the world corodinate to the camera coordinate
        R_input = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
        T_input = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.
        xyz = np.dot(R_input, xyz.transpose(1, 0)).transpose(1, 0) + T_input.reshape(1, 3)

        # obtain the bounds for coord construction
        min_xyz = np.min(xyz, axis=0)
        max_xyz = np.max(xyz, axis=0)
        min_xyz -= 0.05
        max_xyz += 0.05
        bounds = np.stack([min_xyz, max_xyz], axis=0)

        # construct the coordinate
        dhw = xyz[:, [2, 1, 0]]
        min_dhw = min_xyz[[2, 1, 0]]
        max_dhw = max_xyz[[2, 1, 0]]
        voxel_size = np.array(cfg.voxel_size)
        coord = np.round((dhw - min_dhw) / voxel_size).astype(np.int32)

        # construct the output shape
        out_sh = np.ceil((max_dhw - min_dhw) / voxel_size).astype(np.int32)
        x = 32
        out_sh = (out_sh | (x - 1)) + 1
        return coord, out_sh, bounds_world, bounds, mesh, joint

    def affine_transform(self, img, mask, out_shape):
        bbox = cv2.boundingRect(mask.astype(np.uint8))  # x, y, w, h
        bbox = process_bbox(bbox, img.shape[1], img.shape[0], out_shape)
        trans = get_affine_trans_mat(bbox, out_shape)

        img = cv2.warpAffine(img, trans, (int(out_shape[1]), int(out_shape[0])), flags=cv2.INTER_LINEAR)
        mask = cv2.warpAffine(mask, trans, (int(out_shape[1]), int(out_shape[0])), flags=cv2.INTER_NEAREST)
        img[mask == 0] = 0
        return img, trans

    def load_mask(self, mask_path, img_shape):
        mask_cihp_cropped_resized = imageio.imread(mask_path)

        # restore mask to the original image space
        height, width, _ = img_shape
        mask_cihp = cv2.resize(mask_cihp_cropped_resized, (width, height), interpolation=cv2.INTER_NEAREST)

        mask = (mask_cihp != 0).astype(np.uint8)
        border = 5
        kernel = np.ones((border, border), np.uint8)
        mask_erode = cv2.erode(mask.copy(), kernel)
        mask_dilate = cv2.dilate(mask.copy(), kernel)
        mask[(mask_dilate - mask_erode) == 1] = 100
        return mask

    def load_2d_input_view(self, img_path, mask_path, subject, cam_idx, mesh):
        img = imageio.imread(img_path).astype(np.float32) / 255.
        # img = cv2.resize(img, (cfg.mask_shape[1], cfg.mask_shape[0]))
        mask = self.load_mask(mask_path, img.shape)
        assert img.shape[:2] == mask.shape[:2], print(img.shape, mask.shape)
        orig_img_shape = img.shape

        K = np.array(self.cam_info[subject][cam_idx]['K'], dtype=np.float32)
        R = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
        T = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.

        # affine transform for feature extraction
        img, affine_trans_mat = self.affine_transform(img, mask, cfg.input_img_shape)
        return img, R, T, K, affine_trans_mat

    def load_2d_render_view(self, img_path, mask_path, subject, cam_idx, bounds_world):
        img = imageio.imread(img_path).astype(np.float32) / 255.
        # img = cv2.resize(img, (cfg.mask_shape[1], cfg.mask_shape[0]))
        mask = self.load_mask(mask_path, img.shape)
        assert img.shape[:2] == mask.shape[:2], print(img.shape, mask.shape)
        orig_img_shape = img.shape

        K = np.array(self.cam_info[subject][cam_idx]['K'], dtype=np.float32)
        R = np.array(self.cam_info[subject][cam_idx]['R'], dtype=np.float32)
        T = np.array(self.cam_info[subject][cam_idx]['t'], dtype=np.float32) / 1000.

        H, W = cfg.render_img_shape[0], cfg.render_img_shape[1]
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_LINEAR)
        mask = cv2.resize(mask, (W, H), interpolation=cv2.INTER_NEAREST)
        img[mask == 0] = 0
        K[0] = K[0] / orig_img_shape[1] * cfg.render_img_shape[1]
        K[1] = K[1] / orig_img_shape[0] * cfg.render_img_shape[0]

        rgb, ray_o, ray_d, near, far, coord_, mask_at_box = if_nerf_dutils.sample_ray_h36m(img, mask, K, R, T, bounds_world, cfg.N_rand, self.split)
        return rgb, ray_o, ray_d, near, far, coord_, mask_at_box

    def __len__(self):
        return len(self.data_idx)

    def __getitem__(self, index):
        if self.split == 'train':
            subject, action_idx, subaction_idx, frame_idx, input_cam_idx, render_cam_idxs = self.data_idx[index]['subject'], self.data_idx[index]['action_idx'], self.data_idx[index]['subaction_idx'], \
                                                                                            self.data_idx[index]['frame_idx'], self.data_idx[index]['input_cam_idx'], self.data_idx[index]['render_cam_idxs']
        else:
            subject, action_idx, subaction_idx, frame_idx, input_cam_idx, render_cam_idxs = self.data_idx[index]['subject'], self.data_idx[index]['action_idx'], self.data_idx[index]['subaction_idx'], \
                                                                                            self.data_idx[index]['frame_idx'], self.data_idx[index]['input_cam_idx'], self.data_idx[index]['render_cam_idxs']
        data = self.datalist[subject][action_idx][subaction_idx][frame_idx]

        # load mesh
        coord, out_sh, bounds_world, bounds, mesh, joint = self.load_3d_data(data['smpl_param'], subject, input_cam_idx)

        # prepare input view data
        img, R, T, K, affine = self.load_2d_input_view(data['img_path'][input_cam_idx], data['mask_path'][input_cam_idx], subject, input_cam_idx, mesh)

        # prepare render view data
        rgb_list, ray_o_list, ray_d_list, near_list, far_list, mask_at_box_list = [], [], [], [], [], []
        for cam_idx in render_cam_idxs:
            rgb, ray_o, ray_d, near, far, _, mask_at_box = self.load_2d_render_view(data['img_path'][cam_idx], data['mask_path'][cam_idx], subject, cam_idx, bounds_world)
            rgb_list.append(rgb);
            ray_o_list.append(ray_o);
            ray_d_list.append(ray_d);
            near_list.append(near);
            far_list.append(far);
            mask_at_box_list.append(mask_at_box);
        rgb, ray_o, ray_d, near, far, mask_at_box = np.concatenate(rgb_list), np.concatenate(ray_o_list), np.concatenate(ray_d_list), np.concatenate(near_list), np.concatenate(far_list), np.concatenate(mask_at_box_list)

        """
        # for debug
        filename = str(random.randint(1,500))
        vis = img.copy() * 255
        cv2.imwrite(filename + '.jpg', vis)
        _mesh = np.dot(R, mesh.transpose(1,0)).transpose(1,0) + T.reshape(1,3)
        x = _mesh[:,0] / _mesh[:,2] * K[0][0] + K[0][2]
        y = _mesh[:,1] / _mesh[:,2] * K[1][1] + K[1][2]
        xy1 = np.stack((x,y,np.ones_like(x)),1)
        xy = np.dot(affine, xy1.transpose(1,0)).transpose(1,0)
        vis = img.copy()*255
        for v in range(len(xy)):
            vis = cv2.circle(vis, (int(xy[v][0]), int(xy[v][1])), 3, (255,0,0) ,-1)
        cv2.imwrite(filename + '_mesh.jpg', vis)
        """

        # intermediate supervision
        verts_rgb = np.load(data['vertices_rgb_path']).astype(np.float32)
        verts_mask = np.zeros(6890, dtype=np.float32)
        verts_mask[verts_rgb[:, 0] != 0] = 1
        verts_mask = verts_mask.astype(bool)

        ret = {
            'verts_rgb': verts_rgb, 'verts_mask': verts_mask,
            'img': img, 'R': R, 'T': T, 'K': K, 'affine': affine, 'coord': coord, 'out_sh': out_sh, 'bounds': bounds, 'mesh': mesh, 'joint': joint, 'rgb': rgb, 'ray_o': ray_o, 'ray_d': ray_d, 'near': near, 'far': far,
            'mask_at_box': mask_at_box, 'subject_idx': int(subject), 'action_idx': int(action_idx), 'subaction_idx': int(subaction_idx), 'frame_idx': int(frame_idx), 'input_cam_idx': int(input_cam_idx),
            'render_cam_idx': int(render_cam_idxs[0])}
        return ret
Dipankar1997161 commented 1 year ago

@hongsukchoi

Thank you for the code.

So you used the Neural_annot.json and the camera.josn file which you referred to me to use in your previous response.

Is there any other data that I need to pass, or any other code I need to run apart from these.

If so, do let me know