google / lasr

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.
https://lasr-google.github.io/
Apache License 2.0
170 stars 17 forks source link

LASR fails for sequence of a person #7

Closed ecmjohnson closed 3 years ago

ecmjohnson commented 3 years ago

Hello, I am looking to run LASR for a couple different scenes showing a single person. For one (RGB sequence: https://user-images.githubusercontent.com/6766142/126760093-b96c19ae-8e15-4cb6-8942-8ad0a420a2e5.mp4 LASR results: https://user-images.githubusercontent.com/6766142/126760220-8ceff0c3-03bd-432e-8d7a-0b1789112dc7.mp4), LASR works very well using the default parameters and symmetry disabled. However, for the other the method runs to completion, but produces invalid results. The RGB sequence is: https://user-images.githubusercontent.com/6766142/126758853-57390ec1-966d-4488-979e-a1f92632bfb5.mp4

The results using default values (symmetry enabled) show a phantom copy and the mesh doesn't deform to match the mask:

https://user-images.githubusercontent.com/6766142/126759304-3866ae49-64d7-4413-8e38-0405c1e33765.mp4

I disabled the symmetry and now the resulting mesh is an amorphous blob that doesn't even overlap the mask:

https://user-images.githubusercontent.com/6766142/126759364-007f64e3-4cba-4370-9da3-57a8d24ef592.mp4

Monitoring the trends in tensorboard seem to show that everything proceeded well until the end of the first epoch, so I ran the method using only a single epoch which gives the best results so far (although somewhat reminiscent of a tadpole):

https://user-images.githubusercontent.com/6766142/126759766-c3cf477a-507d-4af3-b0f9-2ef447de0309.mp4

I also tried with larger batchsizes as suggested in the readme (6 and 10), but this didn't seem to cause any difference in the results. I verified that the masks and flow fields didn't look vastly incorrect. I'm wondering if this is a known issue or that you might have an idea what has gone wrong for this scene. Thanks!

gengshan-y commented 3 years ago

Hi Erik, thanks for sharing the details, which helps a lot for understanding your problem. Here are my thoughts on the two scenarios (1) symmetry (2) without symmetry:

  1. In short, when symmetry is enabled for your video, LASR ended up finding a wrong symmetry plane, which poses a wrong constraint on the shape via the soft-symmetry loss, Eq(11). Specifically, at the first stage, LASR will hypothesize 16 symmetry planes whose normal is evenly distributed on a hemisphere. At the second stage, LASR finds the symmetry plane with the lowest accumulated reconstruction error and use it in the subsequent stages for construction symmetry loss Eq (11).
  2. When symmetry is disabled, the problem encountered is different. Based on my experience, the misalignment of GT silhouette and renderings is sometimes caused by the re-initialization of bones here, where the rest bone positions are re-intialized but the CNN predictor for bone transforms Eq(13) is not re-initalized. Sometimes, a large and misaligned bone transformation can drag the whole mesh out of screen.

To solve 1, you may manually select the best symmetry plane by modifying the optim_cam variable here. I don't have a piece of code right now, but I usually save reconstructed meshes corresponding to all hypothesis and visually inspect them. Vertices are from states['mean_v'], faces are from states['faces'] and texture are from states['tex'].

To solve 2, re-initializing the corresponding weights of CNN predictor for the bone transformations would work. It is not implemented. In case you want to implement it, these lines could help. Basically you want to maintain the weights for root body pose (the first set of parameters for quat, trans and depth) and assign random values to weights of the second to last set of parameters.

ecmjohnson commented 3 years ago

Hey Gengshan, I went about addressing the case without symmetry and you were exactly correct! I reinit the CNN weights for the non-root bones when reiniting the bones (code included below) and got the following results:

https://user-images.githubusercontent.com/6766142/126960570-150b95fd-f652-4f1f-97bf-25026a577135.mp4

The code added when reiniting the bones (inside the if statement you pointed to) was:

# Quat weights: (4*n_hypo) per bone
quat_weights = self.model.module.code_predictor.quat_predictor.pred_layer.weight
quat_weights_shape = quat_weights.shape
nfeat = quat_weights_shape[-1] # same for all predictors
quat_weights = torch.cat(
    [
        quat_weights.view(self.opts.n_bones, self.opts.n_hypo*4, nfeat)[:1], # leave root bone estimator unchanged
        torch.randn_like(quat_weights.view(self.opts.n_bones, self.opts.n_hypo*4, nfeat)[1:]) # reinit other weights
    ],
    dim=0
).view(quat_weights_shape)
self.model.module.code_predictor.quat_predictor.pred_layer.weight = torch.nn.Parameter(quat_weights)

# Trans weights: 2 per bone
trans_weights = self.model.module.code_predictor.trans_predictor.pred_layer.weight
trans_weights_shape = trans_weights.shape
trans_weights = torch.cat(
    [
        trans_weights.view(self.opts.n_bones, 2, nfeat)[:1], # leave root bone estimator unchanged
        torch.randn_like(trans_weights.view(self.opts.n_bones, 2, nfeat)[1:]) # reinit other weights
    ],
    dim=0
).view(trans_weights_shape)
self.model.module.code_predictor.trans_predictor.pred_layer.weight = torch.nn.Parameter(trans_weights)

# Depth weights: 1 per bone
depth_weights = self.model.module.code_predictor.depth_predictor.pred_layer.weight
depth_weights_shape = depth_weights.shape
depth_weights = torch.cat(
    [
        depth_weights.view(self.opts.n_bones, 1, nfeat)[:1], # leave root bone estimator unchanged
        torch.randn_like(depth_weights.view(self.opts.n_bones, 1, nfeat)[1:]) # reinit other weights
    ],
    dim=0
).view(depth_weights_shape)
self.model.module.code_predictor.depth_predictor.pred_layer.weight = torch.nn.Parameter(depth_weights)

This seems to address the problem of the mesh being dragged out of screen, although I am implicitly making the assumption that all the root body quaternion predictor weights for every hypothesis come before any other bones. Thanks a lot for the excellent help!