Closed Zhiyuan-R closed 1 year ago
Are you using the ShapeNet dataset as well? Can you share the training log here?
Yes! I use shapeNet v2 core 15k(downloading from PVD)
2023-01-25 00:18:15.622 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False) 2023-01-25 00:18:15.637 | INFO | trainers.base_trainer:train_epochs:173 - [rank=0] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200 2023-01-25 00:18:15.845 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3]) 2023-01-25 00:18:15.865 | INFO | main:main:46 - param size = 22.402731M 2023-01-25 00:18:15.865 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False) 2023-01-25 00:18:15.866 | INFO | trainers.base_trainer:train_epochs:173 - [rank=1] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200 2023-01-25 00:18:15.904 | INFO | trainers.base_trainer:prepare_vis_data:691 - tr_x: torch.Size([16, 2048, 3]), m_pcs: torch.Size([16, 1, 3]), s_pcs: torch.Size([16, 1, 1]), val_x: torch.Size([16, 2048, 3]) 2023-01-25 00:18:15.947 | INFO | main:main:46 - param size = 22.402731M 2023-01-25 00:18:15.948 | INFO | main:main:68 - not find any checkpoint: ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints, (exist=False), or snapshot ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot, (exist=False) 2023-01-25 00:18:15.949 | INFO | trainers.base_trainer:train_epochs:173 - [rank=3] Start epoch: 0 End epoch: 800, batch-size=40 | Niter/epo=15 | log freq=15, viz freq 6000, val freq 200 2023-01-25 00:18:38.808 | INFO | trainers.common_fun:validate_inspect_noprior:104 - writer: none 2023-01-25 00:19:01.551 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E0 iter[ 14/ 15] | [Loss] 1053511558071768433312137216.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 14 | url none | [time] 0.8m (~10h) |[best] 0 -100.000x1e-2 2023-01-25 00:19:26.789 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E1 iter[ 14/ 15] | [Loss] 52998332140144.68 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 29 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:19:52.065 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E2 iter[ 14/ 15] | [Loss] 2302480512959926.50 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 44 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:20:17.565 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E3 iter[ 14/ 15] | [Loss] 2568395833090570240.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 59 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:20:43.245 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E4 iter[ 14/ 15] | [Loss] 17809658881111334949003391401984.00 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 74 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:21:09.074 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E5 iter[ 14/ 15] | [Loss] 51566519.44 | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 89 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:21:34.569 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E6 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 104 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:22:00.025 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E7 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 119 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:22:25.365 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E8 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 134 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:22:50.734 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E9 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 149 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:23:16.079 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E10 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 164 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:23:41.553 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E11 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 179 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:24:07.110 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E12 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 194 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:24:32.557 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E13 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 209 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:24:58.175 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E14 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 224 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:25:23.746 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E15 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 239 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:25:49.360 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E16 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 254 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:26:14.849 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E17 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 269 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:26:40.303 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E18 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 284 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:27:05.658 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E19 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 299 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:27:30.983 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E20 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 314 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:27:56.319 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E21 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 329 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:28:21.645 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E22 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 344 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:28:47.099 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E23 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 359 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:29:12.520 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E24 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 374 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:29:38.024 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E25 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 389 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:30:03.487 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E26 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 404 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:30:28.738 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E27 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 419 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:30:53.995 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E28 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 434 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:31:19.211 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E29 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 449 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:31:44.334 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E30 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 464 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:32:09.531 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E31 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 479 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:32:34.830 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E32 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 494 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:33:00.156 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E33 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 509 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:33:25.532 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E34 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 524 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:33:51.046 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E35 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 539 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:34:16.399 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E36 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 554 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:34:41.735 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E37 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 569 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:35:07.293 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E38 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 584 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:35:32.838 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E39 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 599 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:35:58.269 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E40 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 614 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:36:23.576 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E41 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 629 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:36:48.992 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E42 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 644 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:37:14.421 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E43 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 659 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:37:39.900 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E44 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 674 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:38:05.390 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E45 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 689 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:38:30.775 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E46 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 704 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:38:56.245 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E47 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 719 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:39:21.554 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E48 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 734 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:39:46.962 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E49 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 749 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:40:12.541 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E50 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 764 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:40:37.824 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E51 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 779 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:41:03.322 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E52 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 794 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:41:28.677 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E53 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 809 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:41:54.058 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E54 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 824 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:42:19.352 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E55 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 839 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:42:44.698 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E56 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 854 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:43:10.033 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E57 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 869 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:43:35.285 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E58 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 884 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:44:00.556 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E59 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 899 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:44:26.083 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E60 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 914 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:44:51.436 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E61 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 929 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:45:16.718 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E62 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 944 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:45:42.198 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E63 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 959 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:46:07.561 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E64 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 974 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:46:32.978 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E65 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 989 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:46:58.470 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E66 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1004 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:47:23.890 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E67 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1019 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:47:49.273 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E68 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1034 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:48:14.624 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E69 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1049 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:48:39.986 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E70 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1064 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:48:40.001 | INFO | trainers.base_trainer:save:106 - save model as : ./exp/0125/car/3dbf3ah_hvae_lion_B40/checkpoints/snapshot_bak 2023-01-25 00:49:06.715 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E71 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1079 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:49:31.999 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E72 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1094 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:49:57.406 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E73 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1109 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:50:22.712 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E74 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1124 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:50:48.098 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E75 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1139 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:51:13.579 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E76 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1154 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:51:39.065 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E77 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1169 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:52:04.374 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E78 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1184 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:52:29.969 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E79 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1199 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:52:55.488 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E80 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1214 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:53:20.759 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E81 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1229 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:53:46.118 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E82 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1244 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:54:11.518 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E83 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1259 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:54:36.914 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E84 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1274 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:55:02.136 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E85 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1289 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:55:27.793 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E86 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1304 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:55:53.190 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E87 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1319 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:56:18.534 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E88 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1334 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:56:44.018 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E89 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1349 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:57:09.309 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E90 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1364 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 00:57:34.684 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E91 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1379 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 00:57:59.997 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E92 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1394 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 00:58:25.479 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E93 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1409 | url none | [time] 0.4m (~5h) |[best] 0 -100.000x1e-2 2023-01-25 00:58:50.932 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E94 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1424 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 00:59:16.326 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E95 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1439 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 00:59:41.795 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E96 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1454 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:00:07.162 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E97 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1469 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:00:32.569 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E98 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1484 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:00:58.136 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E99 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1499 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:01:23.533 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E100 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1514 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:01:48.939 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E101 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1529 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:02:14.562 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E102 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1544 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:02:39.900 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E103 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1559 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:03:05.674 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E104 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1574 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:03:31.050 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E105 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1589 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:03:56.486 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E106 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1604 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:04:21.979 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E107 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1619 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:04:47.400 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E108 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1634 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:05:12.816 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E109 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1649 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:05:38.353 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E110 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1664 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:06:03.822 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E111 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1679 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:06:29.280 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E112 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1694 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:06:54.803 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E113 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1709 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:07:20.158 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E114 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1724 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:07:45.551 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E115 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1739 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:08:11.027 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E116 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1754 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:08:36.365 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E117 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1769 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:09:01.709 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E118 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1784 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:09:27.067 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E119 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1799 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:09:52.533 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E120 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1814 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:10:18.148 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E121 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1829 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:10:43.401 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E122 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1844 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:11:08.755 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E123 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1859 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:11:34.165 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E124 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1874 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:11:59.572 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E125 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1889 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:12:24.968 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E126 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1904 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:12:50.169 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E127 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1919 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:13:15.662 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E128 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1934 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2 2023-01-25 01:13:41.159 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E129 iter[ 14/ 15] | [Loss] nan | [exp] ./exp/0125/car/3dbf3ah_hvae_lion_B40 | [step] 1949 | url none | [time] 0.4m (~4h) |[best] 0 -100.000x1e-2
And below is my config
bash_name: '' clipforge: clip_model: ViT-B/32 enable: 0 feat_dim: 512 cmt: lion comet_key: '' data: batch_size: 40 batch_size_test: 10 cates: car clip_forge_enable: 0 clip_model: ViT-B/32 cond_on_cat: 0 cond_on_voxel: 0 data_dir: data/ShapeNetCore.v2.PC15k dataset_scale: 1 dataset_type: shapenet15k eval_test_split: 0 input_dim: -1 is_encode_whole_dataset_trainer: 0 nclass: 55 noise_std: 0.1 noise_std_min: -1.0 noise_type: normal normalize_global: true normalize_per_shape: false normalize_range: false normalize_shape_box: false normalize_std_per_axis: false num_workers: 4 random_subsample: 1 recenter_per_shape: false sample_with_replacement: 1 te_max_sample_points: 2048 tr_max_sample_points: 2048 train_drop_last: 1 type: datasets.pointflow_datasets voxel_size: 0.1 ddpm: add_point_feat: true attn:
Hi, I try with VAE training using batch-size 40 on 4 gpus: I also get similar NaN issue. However, the same training code works with batch-size 32. It's not clear to me what's the reason, it seems the training does not work with batch-size > 40 somehow. While I am thinking about this, perhaps you can try using batch-size as 32 for now? Sorry about that!
Thanks for your hard working! I cannot believe you run it yourself! It is so nice of you! Have a good night!
Hi, I train the vae model as the readme part tells. But the training loss become nan. I use 4 gpu and 40 batchsize. And I keep the left the same in the repo.