Closed RetoFan233 closed 11 months ago
Hello, @zzh-tech I want to know the reason why the loss value go into nan and the result of model has no significant effect. Please guide me.
2023/04/12, 15:35:29 - recording parameters ... description: develop seed: 39 threads: 8 num_gpus: 2 no_profile: False profile_H: 1080 profile_W: 1920 resume: True resume_file: /data/UDCVideo/baseline/ESTRNN/experiment/2023_04_05_22_31_29_ESTRNN_VideoUDC/model_best.pth.tar data_root: /home/zhong/Dataset/ dataset: VideoUDC save_dir: ./experiment/ frames: 8 ds_config: 2ms16ms data_format: RGB patch_size: [256, 256] model: ESTRNN n_features: 16 n_blocks: 15 future_frames: 2 past_frames: 2 activation: gelu loss: 1*L1_Charbonnier_loss_color metrics: PSNR optimizer: Adam lr: 0.0005 lr_scheduler: cosine batch_size: 8 milestones: [200, 400] decay_gamma: 0.5 start_epoch: 1 end_epoch: 500 trainer_mode: dp test_only: False test_frames: 20 test_save_dir: ./results/ test_checkpoint: /data/UDCVideo/baseline/ESTRNN/experiment/2023_04_05_22_31_29_ESTRNN_VideoUDC/model_best.pth.tar video: False normalize: True centralize: True time: 2023-04-12 15:35:29.064241 2023/04/12, 15:35:29 - building ESTRNN model ... 2023/04/12, 15:35:32 - model structure: Model( (model): Model( (cell): RDBCell( (F_B0): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (F_B1): RDB_DS( (rdb): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(48, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1)) ) (down_sampling): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)) ) (F_B2): RDB_DS( (rdb): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(32, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(56, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(80, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(104, 32, kernel_size=(1, 1), stride=(1, 1)) ) (down_sampling): Conv2d(32, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2)) ) (F_R): RDNet( (RDBs): ModuleList( (0): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (1): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (2): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (3): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (4): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (5): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (6): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (7): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (8): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (9): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (10): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (11): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (12): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (13): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) (14): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(80, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(112, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(144, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(176, 80, kernel_size=(1, 1), stride=(1, 1)) ) ) (conv1x1): Conv2d(1200, 80, kernel_size=(1, 1), stride=(1, 1)) (conv3x3): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (F_h): Sequential( (0): Conv2d(80, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): RDB( (dense_layers): Sequential( (0): dense_layer( (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (1): dense_layer( (conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) (2): dense_layer( (conv): Conv2d(48, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (act): GELU(approximate=none) ) ) (conv1x1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1)) ) (2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (recons): Reconstructor( (model): Sequential( (0): ConvTranspose2d(400, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (1): ConvTranspose2d(32, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (2): Conv2d(16, 3, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) ) ) (fusion): GSA( (F_f): Sequential( (0): Linear(in_features=160, out_features=320, bias=True) (1): GELU(approximate=none) (2): Linear(in_features=320, out_features=160, bias=True) (3): Sigmoid() ) (F_p): Sequential( (0): Conv2d(160, 320, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1)) ) (condense): Conv2d(160, 80, kernel_size=(1, 1), stride=(1, 1)) (fusion): Conv2d(400, 400, kernel_size=(1, 1), stride=(1, 1)) ) ) ) 2023/04/12, 15:35:36 - generating profile of ESTRNN model ... [profile] computation cost: 458.42 GMACs, parameters: 2.47 M 2023/04/12, 15:35:36 - loading VideoUDC dataloader ... 2023/04/12, 15:35:57 - loading checkpoint /data/UDCVideo/baseline/ESTRNN/experiment/2023_04_05_22_31_29_ESTRNN_VideoUDC/model_best.pth.tar ... 2023/04/12, 15:35:57 - [Epoch 2 / lr 5.00e-04] [train] epoch time: 30389.37s, average batch time: 9.02s [train] 1*L1_Charbonnier_loss_color : 0.0497 (best 0.0497), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.049653; 2023/04/13, 00:02:27 - [Epoch 3 / lr 5.00e-04] [train] epoch time: 31388.36s, average batch time: 9.31s [train] 1*L1_Charbonnier_loss_color : 4138261907.0748 (best 0.0497), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 4138261907.074845; 2023/04/13, 08:45:35 - [Epoch 4 / lr 5.00e-04] [train] epoch time: 31176.41s, average batch time: 9.25s [train] 1*L1_Charbonnier_loss_color : 0.0515 (best 0.0497), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.051471; 2023/04/13, 17:25:12 - [Epoch 5 / lr 5.00e-04] [train] epoch time: 30173.87s, average batch time: 8.95s [train] 1*L1_Charbonnier_loss_color : 0.0486 (best 0.0486), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.048556; 2023/04/14, 01:48:06 - [Epoch 6 / lr 5.00e-04] [train] epoch time: 30326.00s, average batch time: 9.00s [train] 1*L1_Charbonnier_loss_color : 0.0457 (best 0.0457), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.045680; 2023/04/14, 10:13:33 - [Epoch 7 / lr 5.00e-04] [train] epoch time: 30364.56s, average batch time: 9.01s [train] 1*L1_Charbonnier_loss_color : 1016826.8275 (best 0.0457), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 1016826.827540; 2023/04/14, 18:39:38 - [Epoch 8 / lr 5.00e-04] [train] epoch time: 30601.50s, average batch time: 9.08s [train] 1*L1_Charbonnier_loss_color : 0.0460 (best 0.0457), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.045977; 2023/04/15, 03:09:40 - [Epoch 9 / lr 5.00e-04] [train] epoch time: 30508.70s, average batch time: 9.05s [train] 1*L1_Charbonnier_loss_color : 0.0443 (best 0.0443), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.044296; 2023/04/15, 11:38:09 - [Epoch 10 / lr 5.00e-04] [train] epoch time: 30297.35s, average batch time: 8.99s [train] 1*L1_Charbonnier_loss_color : inf (best 0.0443), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : inf; 2023/04/15, 20:03:06 - [Epoch 11 / lr 5.00e-04] [train] epoch time: 30177.56s, average batch time: 8.95s [train] 1*L1_Charbonnier_loss_color : 0.0448 (best 0.0443), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.044764; 2023/04/16, 04:26:04 - [Epoch 12 / lr 4.99e-04] [train] epoch time: 30493.02s, average batch time: 9.05s [train] 1*L1_Charbonnier_loss_color : 0.0441 (best 0.0441), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.044116; 2023/04/16, 12:54:18 - [Epoch 13 / lr 4.99e-04] [train] epoch time: 30154.28s, average batch time: 8.95s [train] 1*L1_Charbonnier_loss_color : 2385378883146.6274 (best 0.0441), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 2385378883146.627441; 2023/04/16, 21:16:52 - [Epoch 14 / lr 4.99e-04] [train] epoch time: 30205.32s, average batch time: 8.96s [train] 1*L1_Charbonnier_loss_color : 0.0451 (best 0.0441), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.045062; 2023/04/17, 05:40:18 - [Epoch 15 / lr 4.99e-04] [train] epoch time: 30142.28s, average batch time: 8.94s [train] 1*L1_Charbonnier_loss_color : 0.0431 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.043079; 2023/04/17, 14:02:41 - [Epoch 16 / lr 4.99e-04] [train] epoch time: 30201.14s, average batch time: 8.96s [train] 1*L1_Charbonnier_loss_color : 6983098584846.5400 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 6983098584846.540039; 2023/04/17, 22:26:02 - [Epoch 17 / lr 4.99e-04] [train] epoch time: 30098.64s, average batch time: 8.93s [train] 1*L1_Charbonnier_loss_color : 0.0440 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.043975; 2023/04/18, 06:47:41 - [Epoch 18 / lr 4.99e-04] [train] epoch time: 30196.35s, average batch time: 8.96s [train] 1*L1_Charbonnier_loss_color : 2596996.1693 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 2596996.169278; 2023/04/18, 15:10:58 - [Epoch 19 / lr 4.98e-04] [train] epoch time: 30428.21s, average batch time: 9.03s [train] 1*L1_Charbonnier_loss_color : 0.0442 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.044210; 2023/04/18, 23:38:07 - [Epoch 20 / lr 4.98e-04] [train] epoch time: 30350.31s, average batch time: 9.01s [train] 1*L1_Charbonnier_loss_color : 111287.4230 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 111287.422983; 2023/04/19, 08:03:57 - [Epoch 21 / lr 4.98e-04] [train] epoch time: 30116.00s, average batch time: 8.94s [train] 1*L1_Charbonnier_loss_color : 0.0439 (best 0.0431), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.043916; 2023/04/19, 16:25:54 - [Epoch 22 / lr 4.98e-04] [train] epoch time: 30330.90s, average batch time: 9.00s [train] 1*L1_Charbonnier_loss_color : 0.0426 (best 0.0426), PSNR : inf (best inf) [train] L1_Charbonnier_loss_color : 0.042591; 2023/04/20, 00:51:25 - [Epoch 23 / lr 4.98e-04] [train] epoch time: 30530.48s, average batch time: 9.06s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/20, 09:20:16 - [Epoch 24 / lr 4.97e-04] [train] epoch time: 30399.65s, average batch time: 9.02s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/20, 17:46:56 - [Epoch 25 / lr 4.97e-04] [train] epoch time: 30338.83s, average batch time: 9.00s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/21, 02:12:35 - [Epoch 26 / lr 4.97e-04] [train] epoch time: 29817.28s, average batch time: 8.85s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/21, 10:29:33 - [Epoch 27 / lr 4.97e-04] [train] epoch time: 30047.09s, average batch time: 8.92s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/21, 18:50:20 - [Epoch 28 / lr 4.96e-04] [train] epoch time: 30250.96s, average batch time: 8.98s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/22, 03:14:32 - [Epoch 29 / lr 4.96e-04] [train] epoch time: 29903.88s, average batch time: 8.87s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/22, 11:32:56 - [Epoch 30 / lr 4.96e-04] [train] epoch time: 30027.95s, average batch time: 8.91s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/22, 19:53:24 - [Epoch 31 / lr 4.96e-04] [train] epoch time: 31119.49s, average batch time: 9.23s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/23, 04:32:04 - [Epoch 32 / lr 4.95e-04] [train] epoch time: 31796.63s, average batch time: 9.44s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/23, 13:22:01 - [Epoch 33 / lr 4.95e-04] [train] epoch time: 31785.75s, average batch time: 9.43s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/23, 22:11:47 - [Epoch 34 / lr 4.95e-04] [train] epoch time: 31376.58s, average batch time: 9.31s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/24, 06:54:44 - [Epoch 35 / lr 4.94e-04] [train] epoch time: 30429.09s, average batch time: 9.03s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/24, 15:21:53 - [Epoch 36 / lr 4.94e-04] [train] epoch time: 30782.03s, average batch time: 9.13s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/24, 23:54:56 - [Epoch 37 / lr 4.94e-04] [train] epoch time: 30213.03s, average batch time: 8.97s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/25, 08:18:30 - [Epoch 38 / lr 4.93e-04] [train] epoch time: 30763.64s, average batch time: 9.13s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/25, 16:51:14 - [Epoch 39 / lr 4.93e-04] [train] epoch time: 30456.50s, average batch time: 9.04s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan; 2023/04/26, 01:18:51 - [Epoch 40 / lr 4.93e-04] [train] epoch time: 30266.62s, average batch time: 8.98s [train] 1*L1_Charbonnier_loss_color : nan (best 0.0426), PSNR : nan (best inf) [train] L1_Charbonnier_loss_color : nan;
Please try to lower the learning rate.
Hello, @zzh-tech I want to know the reason why the loss value go into nan and the result of model has no significant effect. Please guide me.
The training log is described below: