Open Backdrop9019 opened 1 year ago
Hi @Backdrop9019 , 'num_clips' is #temporal, dict(type='ThreeCrop', crop_size=224) means three spatial crops (correspondingly, it has 'CenterCrop' for 1 spatial crop). So this is 32x3x1. I believe the max_testing_views is used to control testing time memory cost. You may refer to https://mmaction2.readthedocs.io/en/0.x/faq.html
Hello, thank you for the insightful research. In the paper, the views during test time are described as follows: Views = #frames × #temporal × #spatial From what I understand, #temporal and #spatial represent the number of temporal and spatial samplings during test time. I'm not very familiar with mmaction, so I'm not sure which part of the config file to refer to. How many views are there for the base-diving48 case?
test_pipeline = [ dict(type='DecordInit'), dict( type='SampleFrames', clip_len=32, frame_interval=16, num_clips=1, frame_uniform=True, test_mode=True), dict(type='DecordDecode'), dict(type='Resize', scale=(-1, 224)), dict(type='ThreeCrop', crop_size=224), dict(type='Flip', flip_ratio=0), dict(type='Normalize', **img_norm_cfg), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ]
Is it 32x1x1? What does max_testing_views mean?
I tried looking into the mmaction documentation but couldn't grasp it, so I'm asking here.
Thank you.