microsoft / GLIP

Grounded Language-Image Pre-training
MIT License
2.07k stars 186 forks source link

COCO Fine-Tuning #169

Open fengjianhui158 opened 2 months ago

fengjianhui158 commented 2 months ago

我使用 docker pull pengchuanzhang/pytorch:ubuntu20.04_torch1.9-cuda11.3-nccl2.9.9 这镜像文件构建了训练环境; 然后进行 COCO Fine-Tuning 微调;

_coco.yaml 的配置信息如下: MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "swin_tiny_patch4_window7_224.pth" BACKBONE: FREEZE_CONV_BODY_AT: -1

use for grounding model

TEST: DURING_TRAINING: False IMS_PER_BATCH: 4 EVAL_TASK: "detection"

DATASETS: TRAIN: ("coco_2017_train", "coco_2017_train") TEST: ("coco_2017_test", ) DISABLE_SHUFFLE: true SOLVER: BASE_LR: 0.0001 LANG_LR: 0.00001 STEPS: (0.67, 0.89) MAX_EPOCH: 24 IMS_PER_BATCH: 4 USE_AMP: True FIND_UNUSED_PARAMETERS: False 然后执行 python tools/train_net.py --config-file "configs/pretrain/_coco.yaml" --skip-test 但是却报错,如下: Traceback (most recent call last): File "tools/train_net.py", line 255, in main() File "tools/train_net.py", line 248, in main model = train(cfg=cfg, File "tools/train_net.py", line 129, in train do_train( File "/workspace/glip/maskrcnn_benchmark/engine/trainer.py", line 123, in do_train loss_dict = model(images, targets) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/workspace/glip/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 107, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/init.py", line 28, in forward x, detections, loss_box = self.box(features, proposals, targets) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 233, in decorate_fwd return fwd(*_cast(args, cast_inputs), _cast(kwargs, cast_inputs)) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 48, in forward x = self.feature_extractor(features, proposals) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py", line 96, in forward x = self.head(x) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 228, in forward x = getattr(self, stage)(x) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 391, in forward out = self.conv2(out) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'BottleneckWithFixedBatchNorm' object has no attribute 'conv2' 希望能够指点解决此问题,谢谢!

fengjianhui158 commented 2 months ago

是我在resnet.py 这个文件中 改代码的问题,
if dcn_config is not None: //这行代码是新增加的 with_dcn = dcn_config.get("stage_with_dcn", False)

        if with_dcn:
            deformable_groups = dcn_config.get("deformable_groups", 1)
            with_modulated_dcn = dcn_config.get("with_modulated_dcn", False)
            self.conv2 = DFConv2d(
                bottleneck_channels,
                bottleneck_channels,
                with_modulated_dcn=with_modulated_dcn,
                kernel_size=3,
                stride=stride_3x3,
                groups=num_groups,
                dilation=dilation,
                deformable_groups=deformable_groups,
                bias=False
            )
        else:
            self.conv2 = Conv2d(
                bottleneck_channels,
                bottleneck_channels,
                kernel_size=3,
                stride=stride_3x3,
                padding=dilation,
                bias=False,
                groups=num_groups,
                dilation=dilation
            )
            nn.init.kaiming_uniform_(self.conv2.weight, a=1)

但是,如果不加这个判断,则会报 如下错误

Traceback (most recent call last): File "tools/train_net.py", line 255, in main() File "tools/train_net.py", line 248, in main model = train(cfg=cfg, File "tools/train_net.py", line 35, in train model = build_detection_model(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/detector/init.py", line 11, in build_detection_model return meta_arch(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 32, in init self.roi_heads = build_roi_heads(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/init.py", line 72, in build_roi_heads roi_heads.append(("box", build_roi_box_head(cfg))) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 75, in build_roi_box_head return ROIBoxHead(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 18, in init self.feature_extractor = make_roi_box_feature_extractor(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py", line 201, in make_roi_box_feature_extractor return func(cfg) File "/workspace/glip/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py", line 80, in init head = resnet.ResNetHead( File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 209, in init module = _make_stage( File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 261, in _make_stage layer_module( File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 466, in init super(BottleneckWithFixedBatchNorm, self).init( File "/workspace/glip/maskrcnn_benchmark/modeling/backbone/resnet.py", line 343, in init with_dcn = dcn_config.get("stage_with_dcn", False) AttributeError: 'NoneType' object has no attribute 'get' 我根据报错,跟踪到 roi_box_feature_extractors.py 看代码 head = resnet.ResNetHead( block_module=config.MODEL.RESNETS.TRANS_FUNC, stages=(stage,), num_groups=config.MODEL.RESNETS.NUM_GROUPS, width_per_group=config.MODEL.RESNETS.WIDTH_PER_GROUP, stride_in_1x1=config.MODEL.RESNETS.STRIDE_IN_1X1, stride_init=None, res2_out_channels=config.MODEL.RESNETS.RES2_OUT_CHANNELS, dilation=config.MODEL.RESNETS.RES5_DILATION ) 确实没有传递 dcn_config ,所以dcn_config 为NoneType,造成了报错,我该这么修改呢