mmiakashs commented 4 years ago

🚀 Feature request

Thanks a lot for releasing LXMERT model. In the LXMERT model code samples, the visual feature extraction code (using generalized faster-rcnn: modeling_frcnn) only in the inference step is given. However, the visual feature extraction during the training phase is not given. For this reason if we use the same code for fine-tuning, it raises NotImplementedError as the visual feature extraction during training is not implemented. Is it possible to share the visual feature extraction during training?

TashinAhmed commented 4 years ago

Yes, I also came up with this error. It would be great if the feature gets published. TIA.

LysandreJik commented 4 years ago

Tagging LXMERT's implementation author @eltoto1219

eltoto1219 commented 4 years ago

Haha, yes we only added the FRCNN for evaluation to accommodate lxmert in the demo. I'll add the training code sometime this week, and then post back here once it is done, in the future it may be useable as a publicly available model following the HF api, but for the time being ill just push the changes to where it is now.

TashinAhmed commented 4 years ago

Thanks for the prompt feedback. Looking forward to it. @eltoto1219

mmiakashs commented 4 years ago

@eltoto1219 thanks, that will be quite a help.

xhyandwyy commented 4 years ago

@eltoto1219 Looking forward to it.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

LetiP commented 3 years ago

Hello, any updates on this? 😃

eltoto1219 commented 3 years ago

Hi @LetiP,

My apologies for the delay! I actually have a couple of conference deadlines mid-January and also some other projects after that, so my free time to implement training code for the FRCNN is unfortunately very limited. I think if I can still manage to add this functionality, it may not be ready until sometime in May. However, the code used for the FRCNN here was majorly adapted from Facebook's detectron2 library. I can point you to the source training code incase you need this functionality sooner!

here is the file for the region proposal network: https://github.com/facebookresearch/detectron2/blob/e0e166d864a2021a15a2bc2c9234d04938066265/detectron2/modeling/proposal_generator/rpn.py#L402

here is the file for the box matcher: https://github.com/facebookresearch/detectron2/blob/master/detectron2/modeling/matcher.py

some utils for the rpn: https://github.com/facebookresearch/detectron2/blob/master/detectron2/modeling/proposal_generator/proposal_utils.py

code for the frcnn output predictions: https://github.com/facebookresearch/detectron2/blob/e0e166d864a2021a15a2bc2c9234d04938066265/detectron2/modeling/roi_heads/fast_rcnn.py#L433

not completely sure if changes are needed in this file for training: https://github.com/facebookresearch/detectron2/blob/master/detectron2/modeling/roi_heads/box_head.py

roi head logic: https://github.com/facebookresearch/detectron2/blob/e0e166d864a2021a15a2bc2c9234d04938066265/detectron2/modeling/roi_heads/roi_heads.py#L307

I may be able to provide some quick pointers if you run into anything that seems impossible to get working by replying more to this thread!

ppwwyyxx commented 3 years ago

Rather than trying to "add training functionality" to the custom copy of an old subset of detectron2 in this repo, I can't see why you cannot just use detectron2 directly. That would not only provide the training functionality out of the box, but also probably reduce the 3000 lines of duplicated unmaintained code here into like 50 lines.

AhmedMasryKU commented 3 years ago

I want to use a frcnn model that is trained on a custom dataset. I followed the tutorials in the original detectron2 repo (Colab Notebooks in https://github.com/facebookresearch/detectron2). However, I noticed that the config file architecture for your pretrained model is different from mine. For example, this is the model part in your config file "model : load_proposals: false device: cpu max_pool: true chkpoint: "" pixel_mean: [102.9801, 115.9465, 122.7717] pixel_std: [1.0, 1.0, 1.0]"

And this is mine: "MODEL: ANCHOR_GENERATOR: ANGLES:

- -90
  - 0
  - 90 ASPECT_RATIOS:
- 0.5
  - 1.0
  - 2.0 NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES:
- 32
- 64
- 128
- 256
- 512 BACKBONE: FREEZE_AT: 2 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES:
res2
res3
res4
res5 NORM: '' OUT_CHANNELS: 256 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: true META_ARCHITECTURE: GeneralizedRCNN PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN:
- 103.53
- 116.28
- 123.675 PIXEL_STD:
- 1.0
- 1.0
- 1.0 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE:
false
false
false
false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES:
res2
res3
res4
res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: true WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: &id001
1.0
1.0
1.0
1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES:
p3
p4
p5
p6
p7 IOU_LABELS:
0
-1
1 IOU_THRESHOLDS:
0.4
0.5 NMS_THRESH_TEST: 0.5 NORM: '' NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS:
- 10.0
  - 10.0
  - 5.0
  - 5.0
- 20.0
  - 20.0
  - 10.0
  - 10.0
- 30.0
  - 30.0
  - 15.0
  - 15.0 IOUS:
0.5
0.6
0.7 ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS:
10.0
10.0
5.0
5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: FastRCNNConvFCHead NORM: '' NUM_CONV: 0 NUM_FC: 2 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: false ROI_HEADS: BATCH_SIZE_PER_IMAGE: 128 IN_FEATURES:
p2
p3
p4
p5 IOU_LABELS:
0
1 IOU_THRESHOLDS:
0.5 NAME: StandardROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 1 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.7 ROI_KEYPOINT_HEAD: CONV_DIMS:
512
512
512
512
512
512
512
512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 4 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: *id001 BOUNDARY_THRESH: -1 HEAD_NAME: StandardRPNHead IN_FEATURES:
p2
p3
p4
p5
p6 IOU_LABELS:
0
-1
1 IOU_THRESHOLDS:
0.3
0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 1000 PRE_NMS_TOPK_TEST: 1000 PRE_NMS_TOPK_TRAIN: 2000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES:
p2
p3
p4
p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: ./output/model_final.pth"

Could you please provide any resources how we can use our own trained frcnn models ?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

mmiakashs commented 3 years ago

Any update on this 🙂 ?

huggingface / transformers

LXMERT visual feature extraction during training/fine-tuning phase #7261

🚀 Feature request