Inquiring about the number of views during test time.

Hello, thank you for the insightful research. In the paper, the views during test time are described as follows: Views = #frames × #temporal × #spatial From what I understand, #temporal and #spatial represent the number of temporal and spatial samplings during test time. I'm not very familiar with mmaction, so I'm not sure which part of the config file to refer to. How many views are there for the base-diving48 case?

test_pipeline = [ dict(type='DecordInit'), dict( type='SampleFrames', clip_len=32, frame_interval=16, num_clips=1, frame_uniform=True, test_mode=True), dict(type='DecordDecode'), dict(type='Resize', scale=(-1, 224)), dict(type='ThreeCrop', crop_size=224), dict(type='Flip', flip_ratio=0), dict(type='Normalize', **img_norm_cfg), dict(type='FormatShape', input_format='NCTHW'), dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]), dict(type='ToTensor', keys=['imgs']) ]

Is it 32x1x1? What does max_testing_views mean?

I tried looking into the mmaction documentation but couldn't grasp it, so I'm asking here.

Thank you.

taoyang1122 / adapt-image-models

Inquiring about the number of views during test time. #31