liuhengyue / fcsgg

A PyTorch implementation for the paper: Fully Convolutional Scene Graph Generation, CVPR 2021
MIT License
27 stars 2 forks source link

RuntimeError: The size of tensor a (7) must match the size of tensor b (8) at non-singleton dimension 1 #9

Closed jbdel closed 10 months ago

jbdel commented 1 year ago

Hello,

Thank you for putting together the code.

1) I'm trying to train a model using python tools/train_net.py --num-gpus 1 --config-file configs/FCSGG_HRNet_W32_2xDownRAF_512x512_MS.yaml

and get:

RuntimeError: The size of tensor a (7) must match the size of tensor b (8) at non-singleton dimension 1

Here is the trace:

File "/home/jb/Documents/fcsgg/fcsgg/modeling/meta_arch/onestage_detector.py", line 296, in forward
    self.preprocess_gt(gt_scene_graphs, images.tensor.shape[-2:], image_ids)
  File "/home/jb/Documents/fcsgg/fcsgg/modeling/meta_arch/onestage_detector.py", line 261, in preprocess_gt
    gt_scene_graphs[i] = self.gt_gen(x, image_size, image_id, training=self.training)
  File "/home/jb/Documents/fcsgg/fcsgg/data/detection_utils.py", line 635, in __call__
    gt_dict = self.generate_gt_scale(sg, image_size,
  File "/home/jb/Documents/fcsgg/fcsgg/data/detection_utils.py", line 581, in generate_gt_scale
    ct_ht_maps = self.generate_score_map(self.num_classes,
  File "/home/jb/Documents/fcsgg/fcsgg/data/detection_utils.py", line 432, in generate_score_map
    masked_fmap = torch.max(masked_fmap, gaussian_mask * k)
RuntimeError: The size of tensor a (7) must match the size of tensor b (8) at non-singleton dimension 1

2) Lets say I want to train on images completely different than VG, say medical images, where we dont have bounding boxes. Which config should I use ? I understand all models but the FCSGG-Base.yaml use pretrained models, such as RCNN and FPN pretrained on COCO.

I tried using FCSGG-Base.yaml, but encounter the error:

  File "/home/jb/Documents/fcsgg/fcsgg/data/detection_utils.py", line 635, in __call__
    gt_dict = self.generate_gt_scale(sg, image_size,
  File "/home/jb/Documents/fcsgg/fcsgg/data/detection_utils.py", line 539, in generate_gt_scale
    range_side = torch.tensor(size_range)
RuntimeError: Could not infer dtype of NoneType

Indeed for this config, MODEL.HEADS.OUTPUT_STRIDES is [4] So when going to this code:

        if len(self.output_strides) == 3:
            self.size_range = [(1e-12, 1 / 8), (1 / 8, 1 / 4), (1 / 4, 1.)]
        elif len(self.output_strides) == 4:
            self.size_range = [(1e-12, 1 / 16),
                               (1 / 16, 1 / 8),
                               (1 / 8, 1 / 4),
                               (1 / 4, 1.)]
        elif len(self.output_strides) == 5:
            self.size_range = [(1e-12, 1 / 16),
                               (1 / 16, 1 / 8),
                               (1 / 8, 1 / 4),
                               (1 / 4, 1 / 2),
                               (1 / 2, 1.)]
        else:
            self.size_range = [None]

then self.size_range is indeed None.

What should I use to train with a model using only convolutions and no model pretrained on natural images ?

Thank you very much for your help

jbdel commented 1 year ago

Hello,

Issues can be resolved if Pytorch must be >=1.4 but <= 1.6 Author, could you please still refer to a config that would work well if we do not have bounding boxes.

liuhengyue commented 1 year ago

I do not think the code can be used without bounding boxes or any kind of dense supervision targets. You may need to modify the detection heads for your purpose.