Closed wangmiaowei closed 1 year ago
I am not sure why you are receiving NaNs, I have rerun a few of my examples and I do not see NaNs. To help debug can you provide the following:
Sorry for the late reply. I use my mesh and update the config file. What you need are pasted here:
How our consistency loss is implemented is that it maps each visible vertex to a pixel in the 224x224 images passed to CLIP. Visibility is computed by the nvdiffrast rasterizer. Then, it compares corresponding visible vertices in different batches.
I took a look at your configuration. Because your mesh has 100k+ vertices, when rendering to a low resolution like 224x224, there will be a ton of occlusions calculated by our method. Since your batch size is relatively small (5), due to these occlusions, even though it may look like shared vertices are being rendered between the batched views, the actual calculated visible vertices may be slightly different between each of the views. Our method also filters view-pairs based on how close their viewing angles are. Due to your batch size, it may be the case that all pairs are filtered out. When the loss is nan
, it means that our method did not find any shared vertices between the unfiltered view-pairs.
There are a few ways you can address this, in my recommendation order -- you may try increasing the batch size, if your GPU memory permits. To enable this, you could try parallelizing our method over multiple GPUs. Increasing the batch size may also generally improve your results. You can also increase the permissible difference between viewing angles (the consistency_elev_filter
and consistency_azim_filter
parameters). You may also try simplifying the input mesh (our face results came from a mesh with only around 10k vertices). Alternatively, you may ignore these nan
s since no gradients are passed (the nan
results from the mean of an empty tensor).
Could you give me some suggestions that why there are NaN losses in the fitting procedure.