Onnx version for Spatio temporal action detection using AVA

nyanmn commented 2 years ago

Hello, I have one application using Spatio temporal action detection using AVA. So the algorithm has slowfast and ava head. I like to convert to ONNX and TensorRT. Now the algorithm is not listed in onnx convertible. May I know what step/infos/references to look into to convert the "Spatio temporal action detection using AVA" algorithm into ONNX, and TensorRT.

tpoisonooo commented 2 years ago

@lvhan028

lvhan028 commented 2 years ago

@irexyc is working on mmaction2 (branch: 1.x) deployment. He can provide guide about slowfast deployment. btw, which branch of mmaction2 are you using, master or dev?

nyanmn commented 2 years ago

Sure thanks. Thanks for reply. I'm also converting to onnx first then to trt. I'll contact him if I have difficulties. How can I contact him?

nyanmn commented 2 years ago

I am using this.

irexyc commented 1 year ago

@nyanmn Hi, could you provide the config you use?

nyanmn commented 1 year ago

I am working with spatio-temporal action detection model using AVA. The config file is as follows.

model = dict(
    type='FastRCNN',
    backbone=dict(
        type='ResNet3dSlowFast',
        pretrained=None,
        resample_rate=4,
        speed_ratio=4,
        channel_ratio=8,
        slow_pathway=dict(
            type='resnet3d',
            depth=50,
            pretrained=None,
            lateral=True,
            fusion_kernel=7,
            conv1_kernel=(1, 7, 7),
            dilations=(1, 1, 1, 1),
            conv1_stride_t=1,
            pool1_stride_t=1,
            inflate=(0, 0, 1, 1),
            spatial_strides=(1, 2, 2, 1)),
        fast_pathway=dict(
            type='resnet3d',
            depth=50,
            pretrained=None,
            lateral=False,
            base_channels=8,
            conv1_kernel=(5, 7, 7),
            conv1_stride_t=1,
            pool1_stride_t=1,
            spatial_strides=(1, 2, 2, 1))),
    roi_head=dict(
        type='AVARoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor3D',
            roi_layer_type='RoIAlign',
            output_size=8,
            with_temporal_pool=True),
        bbox_head=dict(
            type='BBoxHeadAVA',
            dropout_ratio=0.5,
            in_channels=2304,
            num_classes=81,
            multilabel=True)),
    train_cfg=dict(
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssignerAVA',
                pos_iou_thr=0.9,
                neg_iou_thr=0.9,
                min_pos_iou=0.9),
            sampler=dict(
                type='RandomSampler',
                num=32,
                pos_fraction=1,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=1.0,
            debug=False)),
    test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'data/ava/rawframes'
anno_root = 'data/ava/combined_annotations'
org_anno_root = 'data/ava/annotations'
ann_file_train = 'data/ava/combined_annotations/ava_customdataset_proposals_train.csv'
ann_file_val = 'data/ava/combined_annotations/ava_customdataset_proposals_val.csv'
exclude_file_train = 'data/ava/annotations/ava_train_excluded_timestamps_v2.2.csv'
exclude_file_val = 'data/ava/annotations/ava_val_excluded_timestamps_v2.2.csv'
label_file = 'data/ava/combined_annotations/ava_action_list_v2.2_for_activitynet_2019.pbtxt'
proposal_file_train = 'data/ava/combined_annotations/ava_customdataset_proposals_train.pkl'
proposal_file_val = 'data/ava/combined_annotations/ava_customdataset_proposals_val.pkl'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
    dict(type='RawFrameDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=256),
    dict(type='Flip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
    dict(
        type='ToDataContainer',
        fields=[
            dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
        ]),
    dict(
        type='Collect',
        keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
        meta_keys=['scores', 'entity_ids'])
]
val_pipeline = [
    dict(
        type='SampleAVAFrames', clip_len=32, frame_interval=2, test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals']),
    dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
    dict(
        type='Collect',
        keys=['img', 'proposals'],
        meta_keys=['scores', 'img_shape'],
        nested=True)
]
data = dict(
    videos_per_gpu=6,
    workers_per_gpu=2,
    val_dataloader=dict(videos_per_gpu=1),
    test_dataloader=dict(videos_per_gpu=1),
    train=dict(
        type='AVADataset',
        ann_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_train.csv',
        exclude_file=
        'data/ava/annotations/ava_train_excluded_timestamps_v2.2.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=2),
            dict(type='RawFrameDecode'),
            dict(type='RandomRescale', scale_range=(256, 320)),
            dict(type='RandomCrop', size=256),
            dict(type='Flip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(
                type='ToTensor',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
            dict(
                type='ToDataContainer',
                fields=[
                    dict(
                        key=['proposals', 'gt_bboxes', 'gt_labels'],
                        stack=False)
                ]),
            dict(
                type='Collect',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
                meta_keys=['scores', 'entity_ids'])
        ],
        label_file=
        'data/ava/combined_annotations/ava_action_list_v2.2_for_activitynet_2019.pbtxt',
        proposal_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_train.pkl',
        person_det_score_thr=0.9,
        data_prefix='data/ava/rawframes'),
    val=dict(
        type='AVADataset',
        ann_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_val.csv',
        exclude_file=
        'data/ava/annotations/ava_val_excluded_timestamps_v2.2.csv',
        pipeline=[
            dict(
                type='SampleAVAFrames',
                clip_len=32,
                frame_interval=2,
                test_mode=True),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file=
        'data/ava/combined_annotations/ava_action_list_v2.2_for_activitynet_2019.pbtxt',
        proposal_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_val.pkl',
        person_det_score_thr=0.9,
        data_prefix='data/ava/rawframes'),
    test=dict(
        type='AVADataset',
        ann_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_val.csv',
        exclude_file=
        'data/ava/annotations/ava_val_excluded_timestamps_v2.2.csv',
        pipeline=[
            dict(
                type='SampleAVAFrames',
                clip_len=32,
                frame_interval=2,
                test_mode=True),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file=
        'data/ava/combined_annotations/ava_action_list_v2.2_for_activitynet_2019.pbtxt',
        proposal_file=
        'data/ava/combined_annotations/ava_customdataset_proposals_val.pkl',
        person_det_score_thr=0.9,
        data_prefix='data/ava/rawframes'))
optimizer = dict(type='SGD', lr=0.075, momentum=0.9, weight_decay=1e-05)
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
lr_config = dict(
    policy='CosineAnnealing',
    by_epoch=False,
    min_lr=0,
    warmup='linear',
    warmup_by_epoch=True,
    warmup_iters=2,
    warmup_ratio=0.1)
total_epochs = 10
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1)
log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/slowfast_kinetics_pretrained_r50_8x8x1_cosine_10e_ava22_rgb'
load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_8x8x1_256e_kinetics400_rgb/slowfast_r50_8x8x1_256e_kinetics400_rgb_20200716-73547d2b.pth'
resume_from = None
find_unused_parameters = False
gpu_ids = range(0, 4)
omnisource = False
module_hooks = []

blue-q commented 1 year ago

Have you solved the problem？ I also want to convert the "Spatio temporal action detection using AVA" algorithm into ONNX, and TensorRT.

nyanmn commented 1 year ago

No can’t. No reply also

On Mon, 2 Jan 2023 at 10:54 PM, blue-q @.***> wrote:

Have you solved the problem？ I also want to convert the "Spatio temporal action detection using AVA" algorithm into ONNX, and TensorRT.

— Reply to this email directly, view it on GitHub https://github.com/open-mmlab/mmdeploy/issues/1026#issuecomment-1369009589, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRFYY4V6HBHUNZUCQKN2RTWQLT2VANCNFSM6AAAAAAQIIPGIE . You are receiving this because you were mentioned.Message ID: @.***>

open-mmlab / mmdeploy

Onnx version for Spatio temporal action detection using AVA #1026