gov-ai / jax-detectron2

JAX port for Detectron2
0 stars 0 forks source link

Pytorch MRCNN #1

Open INF800 opened 2 years ago

INF800 commented 2 years ago

Added permalinks only to backbone modules. Will add permalinks to other modules soon. Tick here to let others know if you are working on the modules.

INF800 commented 2 years ago
Basic model state dict
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
      (res2): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv1): Conv2d(
            64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv2): Conv2d(
            64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
          )
          (conv3): Conv2d(
            64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
        )
      )
      (res3): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv1): Conv2d(
            256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv2): Conv2d(
            128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
          )
          (conv3): Conv2d(
            128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
        )
      )
      (res4): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
          (conv1): Conv2d(
            512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (3): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (4): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
        (5): BottleneckBlock(
          (conv1): Conv2d(
            1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv2): Conv2d(
            256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
          )
          (conv3): Conv2d(
            256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
          )
        )
      )
      (res5): Sequential(
        (0): BottleneckBlock(
          (shortcut): Conv2d(
            1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
          (conv1): Conv2d(
            1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (1): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
        (2): BottleneckBlock(
          (conv1): Conv2d(
            2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv2): Conv2d(
            512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
          )
          (conv3): Conv2d(
            512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
            (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
          )
        )
      )
    )
  )
  (proposal_generator): RPN(
    (rpn_head): StandardRPNHead(
      (conv): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
    (anchor_generator): DefaultAnchorGenerator(
      (cell_anchors): BufferList()
    )
  )
  (roi_heads): StandardROIHeads(
    (box_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (box_head): FastRCNNConvFCHead(
      (flatten): Flatten(start_dim=1, end_dim=-1)
      (fc1): Linear(in_features=12544, out_features=1024, bias=True)
      (fc_relu1): ReLU()
      (fc2): Linear(in_features=1024, out_features=1024, bias=True)
      (fc_relu2): ReLU()
    )
    (box_predictor): FastRCNNOutputLayers(
      (cls_score): Linear(in_features=1024, out_features=101, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=400, bias=True)
    )
    (mask_pooler): ROIPooler(
      (level_poolers): ModuleList(
        (0): ROIAlign(output_size=(14, 14), spatial_scale=0.25, sampling_ratio=0, aligned=True)
        (1): ROIAlign(output_size=(14, 14), spatial_scale=0.125, sampling_ratio=0, aligned=True)
        (2): ROIAlign(output_size=(14, 14), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
        (3): ROIAlign(output_size=(14, 14), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
      )
    )
    (mask_head): MaskRCNNConvUpsampleHead(
      (mask_fcn1): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn3): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (mask_fcn4): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
        (activation): ReLU()
      )
      (deconv): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
      (deconv_relu): ReLU()
      (predictor): Conv2d(256, 100, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)
INF800 commented 2 years ago
Basic model config dict
{'ANCHOR_GENERATOR': {'ANGLES': [[-90, 0, 90]],
                      'ASPECT_RATIOS': [[0.5, 1.0, 2.0]],
                      'NAME': 'DefaultAnchorGenerator',
                      'OFFSET': 0.0,
                      'SIZES': [[32], [64], [128], [256], [512]]},
 'BACKBONE': CfgNode({'NAME': 'build_resnet_fpn_backbone', 'FREEZE_AT': 0}),
 'DEVICE': 'cuda',
 'FPN': {'FUSE_TYPE': 'sum',
         'IN_FEATURES': ['res2', 'res3', 'res4', 'res5'],
         'NORM': 'GN',
         'OUT_CHANNELS': 256},
 'KEYPOINT_ON': False,
 'LOAD_PROPOSALS': False,
 'MASK_ON': True,
 'META_ARCHITECTURE': 'GeneralizedRCNN',
 'PANOPTIC_FPN': {'COMBINE': {'ENABLED': True,
                              'INSTANCES_CONFIDENCE_THRESH': 0.5,
                              'OVERLAP_THRESH': 0.5,
                              'STUFF_AREA_LIMIT': 4096},
                  'INSTANCE_LOSS_WEIGHT': 1.0},
 'PIXEL_MEAN': [103.53, 116.28, 123.675],
 'PIXEL_STD': [1.0, 1.0, 1.0],
 'PROPOSAL_GENERATOR': CfgNode({'NAME': 'RPN', 'MIN_SIZE': 0}),
 'RESNETS': {'DEFORM_MODULATED': False,
             'DEFORM_NUM_GROUPS': 1,
             'DEFORM_ON_PER_STAGE': [False, False, False, False],
             'DEPTH': 50,
             'NORM': 'GN',
             'NUM_GROUPS': 1,
             'OUT_FEATURES': ['res2', 'res3', 'res4', 'res5'],
             'RES2_OUT_CHANNELS': 256,
             'RES5_DILATION': 1,
             'STEM_OUT_CHANNELS': 64,
             'STRIDE_IN_1X1': False,
             'WIDTH_PER_GROUP': 64},
 'RETINANET': {'BBOX_REG_LOSS_TYPE': 'smooth_l1',
               'BBOX_REG_WEIGHTS': (1.0, 1.0, 1.0, 1.0),
               'FOCAL_LOSS_ALPHA': 0.25,
               'FOCAL_LOSS_GAMMA': 2.0,
               'IN_FEATURES': ['p3', 'p4', 'p5', 'p6', 'p7'],
               'IOU_LABELS': [0, -1, 1],
               'IOU_THRESHOLDS': [0.4, 0.5],
               'NMS_THRESH_TEST': 0.5,
               'NORM': '',
               'NUM_CLASSES': 80,
               'NUM_CONVS': 4,
               'PRIOR_PROB': 0.01,
               'SCORE_THRESH_TEST': 0.05,
               'SMOOTH_L1_LOSS_BETA': 0.1,
               'TOPK_CANDIDATES_TEST': 1000},
 'ROI_BOX_CASCADE_HEAD': {'BBOX_REG_WEIGHTS': ((10.0, 10.0, 5.0, 5.0),
                                               (20.0, 20.0, 10.0, 10.0),
                                               (30.0, 30.0, 15.0, 15.0)),
                          'IOUS': (0.5, 0.6, 0.7)},
 'ROI_BOX_HEAD': {'BBOX_REG_LOSS_TYPE': 'smooth_l1',
                  'BBOX_REG_LOSS_WEIGHT': 1.0,
                  'BBOX_REG_WEIGHTS': (10.0, 10.0, 5.0, 5.0),
                  'CLS_AGNOSTIC_BBOX_REG': False,
                  'CONV_DIM': 256,
                  'FC_DIM': 1024,
                  'NAME': 'FastRCNNConvFCHead',
                  'NORM': 'GN',
                  'NUM_CONV': 4,
                  'NUM_FC': 1,
                  'POOLER_RESOLUTION': 7,
                  'POOLER_SAMPLING_RATIO': 0,
                  'POOLER_TYPE': 'ROIAlignV2',
                  'SMOOTH_L1_BETA': 0.0,
                  'TRAIN_ON_PRED_BOXES': False},
 'ROI_HEADS': {'BATCH_SIZE_PER_IMAGE': 512,
               'IN_FEATURES': ['p2', 'p3', 'p4', 'p5'],
               'IOU_LABELS': [0, 1],
               'IOU_THRESHOLDS': [0.5],
               'NAME': 'StandardROIHeads',
               'NMS_THRESH_TEST': 0.5,
               'NUM_CLASSES': 80,
               'POSITIVE_FRACTION': 0.25,
               'PROPOSAL_APPEND_GT': True,
               'SCORE_THRESH_TEST': 0.05},
 'ROI_KEYPOINT_HEAD': {'CONV_DIMS': (512, 512, 512, 512, 512, 512, 512, 512),
                       'LOSS_WEIGHT': 1.0,
                       'MIN_KEYPOINTS_PER_IMAGE': 1,
                       'NAME': 'KRCNNConvDeconvUpsampleHead',
                       'NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS': True,
                       'NUM_KEYPOINTS': 17,
                       'POOLER_RESOLUTION': 14,
                       'POOLER_SAMPLING_RATIO': 0,
                       'POOLER_TYPE': 'ROIAlignV2'},
 'ROI_MASK_HEAD': {'CLS_AGNOSTIC_MASK': False,
                   'CONV_DIM': 256,
                   'NAME': 'MaskRCNNConvUpsampleHead',
                   'NORM': 'GN',
                   'NUM_CONV': 4,
                   'POOLER_RESOLUTION': 14,
                   'POOLER_SAMPLING_RATIO': 0,
                   'POOLER_TYPE': 'ROIAlignV2'},
 'RPN': {'BATCH_SIZE_PER_IMAGE': 256,
         'BBOX_REG_LOSS_TYPE': 'smooth_l1',
         'BBOX_REG_LOSS_WEIGHT': 1.0,
         'BBOX_REG_WEIGHTS': (1.0, 1.0, 1.0, 1.0),
         'BOUNDARY_THRESH': -1,
         'CONV_DIMS': [-1],
         'HEAD_NAME': 'StandardRPNHead',
         'IN_FEATURES': ['p2', 'p3', 'p4', 'p5', 'p6'],
         'IOU_LABELS': [0, -1, 1],
         'IOU_THRESHOLDS': [0.3, 0.7],
         'LOSS_WEIGHT': 1.0,
         'NMS_THRESH': 0.7,
         'POSITIVE_FRACTION': 0.5,
         'POST_NMS_TOPK_TEST': 1000,
         'POST_NMS_TOPK_TRAIN': 1000,
         'PRE_NMS_TOPK_TEST': 1000,
         'PRE_NMS_TOPK_TRAIN': 2000,
         'SMOOTH_L1_BETA': 0.0},
 'SEM_SEG_HEAD': {'COMMON_STRIDE': 4,
                  'CONVS_DIM': 128,
                  'IGNORE_VALUE': 255,
                  'IN_FEATURES': ['p2', 'p3', 'p4', 'p5'],
                  'LOSS_WEIGHT': 1.0,
                  'NAME': 'SemSegFPNHead',
                  'NORM': 'GN',
                  'NUM_CLASSES': 54},
 'WEIGHTS': ''}
nirmalya8 commented 2 years ago

I will be working on ResNet initially. @INF800

INF800 commented 2 years ago

I will be taking care of understanding how tensors are being manipulated by each module (forward propagation)

Dsantra92 commented 2 years ago

I will tackle the box pooler #2 .

INF800 commented 2 years ago

I will be taking care of understanding how tensors are being manipulated by each module (forward propagation)

https://github.com/gov-ai/jax-detectron2/issues/3 Todo: Notebook