zerchen / AlignSDF

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction, ECCV 2022
72 stars 10 forks source link

Question Regarding the SDF Scale Factor #10

Closed Eric-Gty closed 1 year ago

Eric-Gty commented 1 year ago

Hi Zerui,

Thanks for your work. I have a question regarding the SdfScaleFactor within the config file of two datasets. I would like to know what is this refer to and how to calculate that corresponding to different personalized datasets.

Thanks Eric

zerchen commented 1 year ago

Hi Eric,

Thanks for your interests. The scale factor is used to scale the hand-object meshes to a unit cube. Then, the marching cubes operates on grids of points in this unit cube. To compute the scale, you need to first create your SDF training data. Then, in the training dataset, you could compute the max distance from any negative points (points inside the mesh) to the origin of you defined coordinate system. The inverse of this max distance can be the desired scale factor. Hope it helps.

Best, Zerui

Eric-Gty commented 1 year ago

Hi Zerui,

Really thanks for your reply. However, I'm still very confused about the way to calculate the fixed scale number corresponding to different datasets.

I already finished the construction of SDF training data. You mentioned to "compute the max distance from any negative points to the origin of your defined coordinate system". So, within your dataloader, after this line of code https://github.com/zerchen/AlignSDF/blob/5dcb6cf7565c369545b680b6deced76a9480346a/utils/data.py#L177, the scale of the SDF should be recovered to the original scale of the mesh that utilized to create the SDF training data. To me, this process looks like this answer: https://github.com/marian42/mesh_to_sdf/issues/23#issuecomment-779287297. In this case, if we jointly visualize the SDF with the mesh, it should be look like the following:

After recovered the SDF scale and transfer it with the root-relative coordinate (suppose we define the wrist point as the origin), I think the SDF is already in the "defined coordinate system" as you mentioned. I would like to know if I understand this correctly.

If this is correct, I think based on your description, the following calculation method should be: iteration on all negative samples and calculate the L-2 norm for their recovered (x, y, z) points. Lastly, take the inverse of the maximum as the SDFScaleFactor of this dataset.

If the above description is wrong, I would like to know if you can provide your script for calculating the SDFScaleFactor for any of the dataset as a reference?

Hope this won't bother you too much.

Best regards, Eric

zerchen commented 1 year ago

Hi Eric,

Thanks for your detailed descriptions. Your understanding is correct! I also attach my code (may not compatiable with this codebase) to compute this for a reference. In the code, I think scale_hand is the thing you want.

Best, Zerui

from distutils import debug
import os
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from tqdm import tqdm
import pickle
from fire import Fire
import json

def data_analysis(dataset):
    data_dir = f'data/{dataset}/train/'
    norm_dir = data_dir + 'norm/'
    meta_dir = data_dir + 'meta/'
    hand_dir = data_dir + 'sdf_hand/'
    obj_dir = data_dir + 'sdf_obj/'

    if 'obman' in dataset or 'ho3d' in dataset:
        cam_extr = np.array([[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0]])
    else:
        cam_extr = np.array([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])

    dist_hand_points = []
    dist_obj_points = []
    sample_idx = []
    filenames = os.listdir(norm_dir)
    for idx, filename in tqdm(enumerate(filenames)):
        sample_idx.append(filename.split('.')[0])
        scale = np.load(os.path.join(norm_dir, filename))['scale']
        offset = np.load(os.path.join(norm_dir, filename))['offset']

        hand_data = np.load(os.path.join(hand_dir, filename))
        hand_pos_xyz = hand_data['pos'][:, :3]
        hand_neg_xyz = hand_data['neg'][:, :3]

        obj_data = np.load(os.path.join(obj_dir, filename))
        obj_pos_xyz = obj_data['pos'][:, :3]
        obj_neg_xyz = obj_data['neg'][:, :3]

        # transform all points into camera space
        hand_pos_xyz_cam = hand_pos_xyz / scale - offset
        hand_neg_xyz_cam = hand_neg_xyz / scale - offset
        obj_pos_xyz_cam = obj_pos_xyz / scale - offset
        obj_neg_xyz_cam = obj_neg_xyz / scale - offset

        with open(os.path.join(meta_dir, filename.replace('npz', 'pkl')), 'rb') as f:
            meta_data = pickle.load(f)

        cam_joints = np.dot(cam_extr, meta_data['coords_3d'].transpose(1, 0)).transpose(1, 0)
        hand_neg_dist_wrist = np.linalg.norm(hand_neg_xyz_cam - cam_joints[0], axis=1)
        obj_neg_dist_wrist = np.linalg.norm(obj_neg_xyz_cam - cam_joints[0], axis=1)

        dist_hand_points.append(np.max(hand_neg_dist_wrist))
        dist_obj_points.append(np.max(obj_neg_dist_wrist))

        np.savez(os.path.join(norm_dir, filename), scale=scale, offset=offset, scale_hand= 1 / np.max(hand_neg_dist_wrist))
Eric-Gty commented 1 year ago

Hi Eric,

Thanks for your detailed descriptions. Your understanding is correct! I also attach my code (may not compatiable with this codebase) to compute this for a reference. In the code, I think scale_hand is the thing you want.

Best, Zerui

from distutils import debug
import os
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from tqdm import tqdm
import pickle
from fire import Fire
import json

def data_analysis(dataset):
    data_dir = f'data/{dataset}/train/'
    norm_dir = data_dir + 'norm/'
    meta_dir = data_dir + 'meta/'
    hand_dir = data_dir + 'sdf_hand/'
    obj_dir = data_dir + 'sdf_obj/'

    if 'obman' in dataset or 'ho3d' in dataset:
        cam_extr = np.array([[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0]])
    else:
        cam_extr = np.array([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])

    dist_hand_points = []
    dist_obj_points = []
    sample_idx = []
    filenames = os.listdir(norm_dir)
    for idx, filename in tqdm(enumerate(filenames)):
        sample_idx.append(filename.split('.')[0])
        scale = np.load(os.path.join(norm_dir, filename))['scale']
        offset = np.load(os.path.join(norm_dir, filename))['offset']

        hand_data = np.load(os.path.join(hand_dir, filename))
        hand_pos_xyz = hand_data['pos'][:, :3]
        hand_neg_xyz = hand_data['neg'][:, :3]

        obj_data = np.load(os.path.join(obj_dir, filename))
        obj_pos_xyz = obj_data['pos'][:, :3]
        obj_neg_xyz = obj_data['neg'][:, :3]

        # transform all points into camera space
        hand_pos_xyz_cam = hand_pos_xyz / scale - offset
        hand_neg_xyz_cam = hand_neg_xyz / scale - offset
        obj_pos_xyz_cam = obj_pos_xyz / scale - offset
        obj_neg_xyz_cam = obj_neg_xyz / scale - offset

        with open(os.path.join(meta_dir, filename.replace('npz', 'pkl')), 'rb') as f:
            meta_data = pickle.load(f)

        cam_joints = np.dot(cam_extr, meta_data['coords_3d'].transpose(1, 0)).transpose(1, 0)
        hand_neg_dist_wrist = np.linalg.norm(hand_neg_xyz_cam - cam_joints[0], axis=1)
        obj_neg_dist_wrist = np.linalg.norm(obj_neg_xyz_cam - cam_joints[0], axis=1)

        dist_hand_points.append(np.max(hand_neg_dist_wrist))
        dist_obj_points.append(np.max(obj_neg_dist_wrist))

        np.savez(os.path.join(norm_dir, filename), scale=scale, offset=offset, scale_hand= 1 / np.max(hand_neg_dist_wrist))

Hi Zerui,

Thanks for your detailed reply, it's of a great help to me. I'll go try whether it's bounded on my personalized dataset.

Thanks again :)))

Best regards, Eric

zerchen commented 1 year ago

You are welcome!

Eric-Gty commented 1 year ago

Hi Zerui,

Sorry for disturbing you again after few days. After finish building the SDF training data, I tried to train the SDF network on my personalized dataset but failed to reconstruct the hand shape.

So I set up the Obman dataset and run your codebase to compare its training process with mine. However, after I visualize the result, some problems are observed and I hope to receive some valuable feedback from you.

I followed the original experimental setup defined in https://github.com/zerchen/AlignSDF/blob/master/experiments/obman/30k_1e2d_mlp5.json. Since I only care about the hand part, so I simply ignore the object branch, I think this shouldn't have major influence on the reconstruction quality of hand. As a result, I run two experiments on obman, one only utilize the Hand SDF Decoder, and the other one include the MANO Decoder, which are the (a) and (b) ablation experiments defined in your paper as below:

To save time, I tried the reconstruction process on test set after 110 epochs of training. As a result, all reconstructed samples looks very similar with each other with non-plausible hand as below:

I would like to know whether this is caused by the limited number of training epoch? Or, it's because of the lack of MANO prior involved (even with ablation b, the MANO prior is not embedded into the SDF decoder, it only add another three loss terms: 3d joint, beta, theta). I even try to overfit the training samples to see whether it can create a reasonable hand shape, but still, it failed and the shape is similar to the above one.

During your experiments, have you observe a relatively satisfied hand shape with SDF decoder only? My guess of this problem is that: either is caused by the wrong experimental setup, or is because of the lack of MANO prior. Hope to receive a empirical answer from you.

Another question is regarding the data construction. After we multiply the SDF_Scale_Factor, the SDF sample is further divide by 2 as follows: https://github.com/zerchen/AlignSDF/blob/a944dd0cac847bb61dd80f058b8c71d90ed56831/utils/data.py#L198

This is a very minor question, but I just curious why should we do this. Is it because we try to further bound the xyz into (-0.5, 0.5) to form a accurate unit cube around the origin?

Thanks a lot for your time :)

zerchen commented 1 year ago

Hi,

I think the main issue for producing such a blurry hand is the limited training epochs. Using MANO prior could alleviate ths issue and maybe could produce a relatively more clear hand at a early training stage (but 110 epochs maybe are still not enough). In my experiments, it is also difficult to produce a clear hand without MANO prior. Yes, you are right. The reason that I divide sdf samples by 2 is to scale points into the unit cube (since I want to convert all object negative points into the unit cube). If you want to reconstruct hands soley, I think you don't need to divide hand_samples by 2, and you could have a try. Hope it answers your question.

Best, Zerui

Eric-Gty commented 1 year ago

Hi,

I think the main issue for producing such a blurry hand is the limited training epochs. Using MANO prior could alleviate ths issue and maybe could produce a relatively more clear hand at a early training stage (but 110 epochs maybe are still not enough). In my experiments, it is also difficult to produce a clear hand without MANO prior. Yes, you are right. The reason that I divide sdf samples by 2 is to scale points into the unit cube (since I want to convert all object negative points into the unit cube). If you want to reconstruct hands soley, I think you don't need to divide hand_samples by 2, and you could have a try. Hope it answers your question.

Best, Zerui

Hi Zerui,

Really thanks for your answer. The reason why I asked about the training setup is that I'm currently having limited computational resources. After receiving your feedback, I do continue training with it and obtain promising results. Thanks again for this.

Regarding the hand reconstruction solely, on my personal dataset, I set the wrist as the root joint and bounded the sample within the unit cube. However, no matter whether I divide by 2, the reconstruction just fails, it's not even a hand. The augmentation I used simply randomly rotates the hand sample along the wrist joint on the x-y plane (z keep unchanged). So it should always be within the unit cube no matter the rotation degree.

Just curious whether you've met similar problems when dealing with SDF reconstruction at the beginning (sometimes the reconstruction result is even a flat plane). I've clarified that the data should be all correct, so the problem maybe is because the model fails to learn the shape information.

I'll go continue debugging this, but if you have met similar problems before, please kindly let me know some possible mistakes :))

Really thanks for your valuable time in this conversation, it helps me a lot!

Best regards, Eric