Open RayDean opened 6 months ago
and when 250K steps finished, the final total_loss is inf, lpips_loss is inf, sr_lpips_loss is also inf
| Training end.. Epoch 0 ended. Steps: 250001. {'total_loss': inf, 'mse_loss': 0.0024544131240717033, 'weights_entropy_loss': 0.050688008500976045, 'num_non_facemask': 56165.82106877656, 'ambient_loss': 2.8842663650615378e-08, 'sr_mse_loss': 0.0008115496115366654, 'lambda_ambient': 469.427371226522, 'head_psnr': 27.943281164014728, 'density_grid_info_min_density': -1.0, 'density_grid_info_max_density': 364738707.3452703, 'density_grid_info_mean_density': 1790.830806371328, 'density_grid_info_occupancy_rate': 0.25496578732052366, 'density_grid_info_step_mean_count': 299778.5135135135, 'lpips_loss': inf, 'sr_lpips_loss': inf, 'sr_lip_lpips_loss': 1.1583641622033645}
Is the normal, how can I fix it? Thanks
same issue
when I trained head NERF and training steps reached 250K, the total_loss is too high, nearly 580, and other loss seems normal. partial logs are :
| Validation results@248000: {'total_loss': 582.6377294922, 'mse_loss': 0.0012603372, 'sr_mse_loss': 0.0013412535, 'lpips_loss': 1.0247015435, 'sr_lpips_loss': 1.1453602004, 'sr_lip_lpips_loss': 1.0416638839, 'lambda_ambient': 579.4234008789} 03/06 04:17:08 PM Epoch 00000@248000: saving model to checkpoints/motion2video_nerf/meimei_head/model_ckpt_steps_248000.ckpt 03/06 04:17:08 PM Delete ckpt: model_ckpt_steps_246000.ckpt
is this high loss normal? or how can I lower down the total_loss? Thanks
Till what number of steps does the training of head nerf takes place and how much time it takes. can we stop the process and then resume it from the same checkpoints
when I trained head NERF and training steps reached 250K, the total_loss is too high, nearly 580, and other loss seems normal. partial logs are :
| Validation results@248000: {'total_loss': 582.6377294922, 'mse_loss': 0.0012603372, 'sr_mse_loss': 0.0013412535, 'lpips_loss': 1.0247015435, 'sr_lpips_loss': 1.1453602004, 'sr_lip_lpips_loss': 1.0416638839, 'lambda_ambient': 579.4234008789} 03/06 04:17:08 PM Epoch 00000@248000: saving model to checkpoints/motion2video_nerf/meimei_head/model_ckpt_steps_248000.ckpt 03/06 04:17:08 PM Delete ckpt: model_ckpt_steps_246000.ckpt
is this high loss normal? or how can I lower down the total_loss? Thanks