Triangle walk is not working

willydjhuang commented 1 week ago

Thanks for your amazing work! While the results of training looks decent, I noticed that the triangle walking algorithm is somehow not working in the implementation.

To be more specific, in walking_on_triangles() from ./SplattingAvatar/model/splatting_avatar_model.py, my understanding is that delta = self._xyz[..., :2].detach().cpu().numpy().astype(np.double) should be the updates for the uv coordinate. However, I found that this returned value delta will always be zero, leading to no triangle walking. I traced the code and found that self._xyz[..., :2] aren't going through back-propagation since uv coordinate are calculated with self.sample_bary and mesh vertices, not from self._xyz[..., :2], so self._xyz[..., :2] will always not be updated, i.e., will always be zero.

Could you please provide some hint about how to correctly activate triangle walking?

byfron commented 3 days ago

I noticed the same problem. The deltas on the barycentric coords are always zero. There's no gradient propagated, nor is xyz[... ,:2] updated by other means

willydjhuang commented 3 days ago

I noticed the same problem. The deltas on the barycentric coords are always zero. There's no gradient propagated, nor is xyz[... ,:2] updated by other means

Yeah I got the same. So my current workaround is to integrating self._xyz[..., :2] to ./splatting_avatar_model/get_xyz() for xyz calculation so that self._xyz[..., :2] will be included into the gradient flow, then updated. I did the following modifications in ./splatting_avatar_model/base_xyz():

@property
def base_xyz(self):
    # add self._xyz[..., :2] to obtain gradient for uv coordinates
    bary_coords = self.sample_bary.clone()
    bary_coords[..., :2] += self._xyz[..., :2]
    bary_coords[..., -1] = 1.0 - bary_coords[..., 0] - bary_coords[..., 1]
    return retrieve_verts_barycentric(self.mesh_verts, self.cano_faces, 
                                      self.sample_fidxs, bary_coords)

by doing this, self._xyz[..., :2] is updated by the optimizer for triangle_walk_interval iterations, then the walking_on_triangles() is called to update uv coordinate.

However, although the triangle walking can now be triggered, I found that the amount of walk of most gaussians are actually pretty small throughout the training process, the the portion of gaussians that move to other triangles is very low.

willydjhuang commented 3 days ago

@initialneil Could you give some comment about the triangle walking implementation?

Another issue I found is that, in./splatting_avatar_optim/reset_optimizer_uv(), it will reset the delta_u and delta_v (which are the self._xyz[..., :2]) after triangle walking by re-assigning group['params'][0] (it's the params in optimizer.param_groups) as well as updating the corresponding optimizer.state. However, It doesn't update the model xyz parameters, so the optimizer lost the reference to the model xyz parameters, leading to no update in all the later training process. (what optimizer stores is the reference to actual model parameters, normally we'd have to align the reference and the referred parameters.)

The visualization below is trained with all the default configuration of the repo, on nf_01 avatar: nf_01 One can see that the gaussians' xyz are sticked to the mesh surface, failing to fit the hair geometry. I think the reason is that the xyz parameters aren't updated because the optimizer reference issue mentioned above.

What I did currently is to add one line in ./splatting_avatar_optim/reset_optimizer_uv():

# update the model parameter as well
self.gs_model._xyz = group['params'][0]

And it seems working that the gaussians can now fit the geometry freely.

initialneil commented 2 days ago

@willydjhuang Thanks very much for digging into this. I checked on the Figure A6 here https://arxiv.org/pdf/2403.05087, it used to work. This part of code was done mimicking the gaussian's clone and splitting code, could you please check if there's any difference? I'm currently caught up by something else and will take a look later.

byfron commented 2 days ago

One can see that the gaussians' xyz are sticked to the mesh surface, failing to fit the hair geometry. I think the reason is that the xyz parameters aren't updated because the optimizer reference issue mentioned above.

@willydjhuang could it be that the displacements across the mesh normals are always zero as well? These should be optimized by the scalars in self._xyz[..., -1]

willydjhuang commented 2 days ago

@willydjhuang Thanks very much for digging into this. I checked on the Figure A6 here https://arxiv.org/pdf/2403.05087, it used to work. This part of code was done mimicking the gaussian's clone and splitting code, could you please check if there's any difference? I'm currently caught up by something else and will take a look later.

@initialneil Thanks very much for replying! My observations are that

In ./splatting_avatar_optim/reset_optimizer_uv():
```
group['params'][0] = nn.Parameter(torch.cat((torch.zeros_like(_xyz[..., :2]), _xyz[..., 2:]), dim=-1).requires_grad_(True))
self.optimizer.state[group['params'][0]] = stored_state
```
will reset the delta value (_xyz[:,:2] actually). However, it doesn't update the actual model parameter, so after the first call of ./splatting_avatar_optim/reset_optimizer_uv(), the optimizer will lost the reference to the actual model parameters, leading to the result shown in this comment. So my current solution is to add:
```
self.gs_model._xyz = group['params'][0]
```
In this way, the reference of optimizer is pointed to actual model parameters, so model parameters (especially the displacement along normal direction) can be updated.
After I could update the model parameters, I run into the other issue: triangle walking isn't working properly, and my speculation is mentioned in this comment. My current solution is shown in this comment: what I'm doing will include _xyz[:,:2] into the gradient flow of coordinate calculation, so _xyz[:,:2] won't be zero and will be updated by optimizer, thus _xyz[:,:2] can be used as delta for triangle walking. By doing this, the triangle walking can now be activated. However, I found that only few gaussians will walk for large distance and most gaussians keep their (u, v) coordinate on their triangle unchanged throughout the training process, leading to the (b) subfigure in Figure A6 here https://arxiv.org/pdf/2403.05087 (gaussians piling up).

Feel free to point out anything I misunderstood about your repo. It would be great if you could provide some examples (or maybe visualization of gaussian points) about how to activate triangle walking properly.

initialneil / SplattingAvatar

Triangle walk is not working #43