open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.29k stars 1.25k forks source link

grad_norm becomes 0 immediately #1643

Open vsc9729 opened 2 years ago

vsc9729 commented 2 years ago

I've been trying to a train using the slowonly_r50_u48_240e_ntu120_xsub_keypoint.py file and it just makes the grad_norm 0 immediately. Is this normal behaviour. I I've set the videos_per_gpu to 2.

log file:2022-05-23 15:43:28,364 - mmaction - INFO - Environment info:

sys.platform: win32 Python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: Quadro RTX 4000 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1 NVCC: Cuda compilation tools, release 10.1, V10.1.24 MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30133 for x64 GCC: n/a PyTorch: 1.11.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.12.0 OpenCV: 4.5.5 MMCV: 1.5.0 MMCV Compiler: MSVC 192930140 MMCV CUDA Compiler: 11.3 MMAction2: 0.23.0+5e853b1

2022-05-23 15:43:28,367 - mmaction - INFO - Distributed training: False 2022-05-23 15:43:30,328 - mmaction - INFO - Config: model = dict( type='Recognizer3D', backbone=dict( type='ResNet3dSlowOnly', depth=50, pretrained=None, in_channels=17, base_channels=32, num_stages=3, out_indices=(2, ), stage_blocks=(4, 6, 3), conv1_stride_s=1, pool1_stride_s=1, inflate=(0, 1, 1), spatial_strides=(2, 2, 2), temporal_strides=(1, 1, 2), dilations=(1, 1, 1)), cls_head=dict( type='I3DHead', in_channels=512, num_classes=120, spatial_type='avg', dropout_ratio=0.5), train_cfg=dict(), test_cfg=dict(average_clips='prob')) dataset_type = 'PoseDataset' ann_file_train = 'C:/Users/user/Desktop/vikrant/mmaction2/configs/skeleton/posec3d/result.pkl' ann_file_val = 'C:/Users/user/Desktop/vikrant/mmaction2/tools/data/skeleton/val.pkl' left_kp = [1, 3, 5, 7, 9, 11, 13, 15] right_kp = [2, 4, 6, 8, 10, 12, 14, 16] train_pipeline = [ dict(type='UniformSampleFrames', clip_len=48), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), dict(type='Resize', scale=(56, 56), keep_ratio=False), dict( type='Flip', flip_ratio=0.5, left_kp=[1, 3, 5, 7, 9, 11, 13, 15], right_kp=[2, 4, 6, 8, 10, 12, 14, 16]), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs', 'label']) ] val_pipeline = [ dict(type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='CenterCrop', crop_size=64), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ] test_pipeline = [ dict( type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='CenterCrop', crop_size=64), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False, double=True, left_kp=[1, 3, 5, 7, 9, 11, 13, 15], right_kp=[2, 4, 6, 8, 10, 12, 14, 16]), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ] data = dict( videos_per_gpu=8, workers_per_gpu=2, test_dataloader=dict(videos_per_gpu=1), train=dict( type='PoseDataset', ann_file= 'C:/Users/user/Desktop/vikrant/mmaction2/configs/skeleton/posec3d/result.pkl', data_prefix='', class_prob=dict({ 0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1, 32: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 1, 44: 1, 45: 1, 46: 1, 47: 1, 48: 1, 49: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1, 60: 2, 61: 2, 62: 2, 63: 2, 64: 2, 65: 2, 66: 2, 67: 2, 68: 2, 69: 2, 70: 2, 71: 2, 72: 2, 73: 2, 74: 2, 75: 2, 76: 2, 77: 2, 78: 2, 79: 2, 80: 2, 81: 2, 82: 2, 83: 2, 84: 2, 85: 2, 86: 2, 87: 2, 88: 2, 89: 2, 90: 2, 91: 2, 92: 2, 93: 2, 94: 2, 95: 2, 96: 2, 97: 2, 98: 2, 99: 2, 100: 2, 101: 2, 102: 2, 103: 2, 104: 2, 105: 2, 106: 2, 107: 2, 108: 2, 109: 2, 110: 2, 111: 2, 112: 2, 113: 2, 114: 2, 115: 2, 116: 2, 117: 2, 118: 2, 119: 2 }), pipeline=[ dict(type='UniformSampleFrames', clip_len=48), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='RandomResizedCrop', area_range=(0.56, 1.0)), dict(type='Resize', scale=(56, 56), keep_ratio=False), dict( type='Flip', flip_ratio=0.5, left_kp=[1, 3, 5, 7, 9, 11, 13, 15], right_kp=[2, 4, 6, 8, 10, 12, 14, 16]), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs', 'label']) ]), val=dict( type='PoseDataset', ann_file= 'C:/Users/user/Desktop/vikrant/mmaction2/tools/data/skeleton/val.pkl', data_prefix='', pipeline=[ dict( type='UniformSampleFrames', clip_len=48, num_clips=1, test_mode=True), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='CenterCrop', crop_size=64), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ]), test=dict( type='PoseDataset', ann_file= 'C:/Users/user/Desktop/vikrant/mmaction2/tools/data/skeleton/val.pkl', data_prefix='', pipeline=[ dict( type='UniformSampleFrames', clip_len=48, num_clips=10, test_mode=True), dict(type='PoseDecode'), dict(type='PoseCompact', hw_ratio=1.0, allow_imgpad=True), dict(type='Resize', scale=(-1, 64)), dict(type='CenterCrop', crop_size=64), dict( type='GeneratePoseTarget', sigma=0.6, use_score=True, with_kp=True, with_limb=False, double=True, left_kp=[1, 3, 5, 7, 9, 11, 13, 15], right_kp=[2, 4, 6, 8, 10, 12, 14, 16]), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ])) optimizer = dict(type='SGD', lr=0.2, momentum=0.9, weight_decay=0.0003) optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2)) lr_config = dict(policy='CosineAnnealing', by_epoch=False, min_lr=0) total_epochs = 240 checkpoint_config = dict(interval=10) workflow = [('train', 10)] evaluation = dict( interval=10, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5)) log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = 'work_dirs/slowonly_r50_u48_240e_ntu120_xsub_keypoint' load_from = None resume_from = None find_unused_parameters = False gpu_ids = range(0, 1) omnisource = False module_hooks = []

2022-05-23 15:43:30,337 - mmaction - INFO - Set random seed to 1498660289, deterministic: False 2022-05-23 15:43:30,677 - mmaction - INFO - 350 videos remain after valid thresholding 2022-05-23 15:43:32,696 - mmaction - INFO - 4 videos remain after valid thresholding 2022-05-23 15:43:32,697 - mmaction - INFO - Start running, host: user@DESKTOP-8UM5TVP, work_dir: C:\Users\user\Desktop\vikrant\mmaction2\work_dirs\slowonly_r50_u48_240e_ntu120_xsub_keypoint 2022-05-23 15:43:32,698 - mmaction - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook


before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook


before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) EvalHook


after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(LOW ) EvalHook
(VERY_LOW ) TextLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook


after_run: (VERY_LOW ) TextLoggerHook


2022-05-23 15:43:32,700 - mmaction - INFO - workflow: [('train', 10)], max: 240 epochs 2022-05-23 15:43:32,700 - mmaction - INFO - Checkpoints will be saved to C:\Users\user\Desktop\vikrant\mmaction2\work_dirs\slowonly_r50_u48_240e_ntu120_xsub_keypoint by HardDiskBackend. 2022-05-23 15:43:59,346 - mmaction - INFO - Epoch [1][20/44] lr: 2.000e-01, eta: 3:54:01, time: 1.332, data_time: 0.674, memory: 4786, top1_acc: 0.9500, top5_acc: 0.9500, loss_cls: 0.2525, loss: 0.2525, grad_norm: 0.3853 2022-05-23 15:44:14,202 - mmaction - INFO - Epoch [1][40/44] lr: 2.000e-01, eta: 3:01:54, time: 0.743, data_time: 0.247, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:44:42,399 - mmaction - INFO - Epoch [2][20/44] lr: 2.000e-01, eta: 3:01:52, time: 1.252, data_time: 0.748, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:44:56,541 - mmaction - INFO - Epoch [2][40/44] lr: 2.000e-01, eta: 2:47:42, time: 0.707, data_time: 0.203, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:45:26,182 - mmaction - INFO - Epoch [3][20/44] lr: 1.999e-01, eta: 2:52:10, time: 1.303, data_time: 0.797, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:45:40,106 - mmaction - INFO - Epoch [3][40/44] lr: 1.999e-01, eta: 2:43:54, time: 0.696, data_time: 0.185, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:46:08,818 - mmaction - INFO - Epoch [4][20/44] lr: 1.999e-01, eta: 2:47:17, time: 1.296, data_time: 0.785, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000 2022-05-23 15:46:22,913 - mmaction - INFO - Epoch [4][40/44] lr: 1.999e-01, eta: 2:41:44, time: 0.705, data_time: 0.191, memory: 4786, top1_acc: 1.0000, top5_acc: 1.0000, loss_cls: 0.0000, loss: 0.0000, grad_norm: 0.0000

Dai-Wenxun commented 2 years ago

Hi, @vsc9729 It looks like you used your own dataset for training, I think it's probably the problem of your dataset.