Several issues in extracting VinVL Feature extraction

he159ok commented 3 years ago

When I have installed your environment step by step by option 1 and then run the command for below,

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

There are several issues,

I do not find the code to load the pre_trained model parameters for "AttrRCNN". Though the command has a pre-trained model path, I do not find a concrete torch.load() code by debugging. Thus I wonder whether I need to add torch.load() by self when I run the above command.
The "self.training" in "AttrRCNN" is extended from "torch.nn.modules.module.py", which is set as True by default. But in run the command to extract VinVL features by the beginning command, it seems that it should be False and I have to overwrite each init functions of AttrRCNN, its "self.rpn", and "self.roi_heads" as below,
```
proposals, proposal_losses = self.rpn(images, features, targets, is_training = self.training)
x, predictions, detector_losses = self.roi_heads(features,  proposals, targets, is_training = self.training) 
```

Instead of applying Pytorch 1.4, I apply Pytorch 1.7, but it always gives running errors for several in-place operations, such as below codes in "bounding_box.py"

    def clip_to_image(self, remove_empty=True):
    TO_REMOVE = 1
    self.bbox[:, 0].clamp_(min=0, max=self.size[0] - TO_REMOVE)
    self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
    self.bbox[:, 2].clamp_(min=0, max=self.size[0] - TO_REMOVE)
    self.bbox[:, 3].clamp_(min=0, max=self.size[1] - TO_REMOVE)

the error is as below,

File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 188, in _forward_test
boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
File "/home/jfhe/anaconda3/envs/JD2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 114, in forward_for_single_feature_map
boxlist = boxlist.clip_to_image(remove_empty=False)
File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 217, in clip_to_image
self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

I address them by setting "with torch.no_grad()", but it feels strange. If I have to fine-tune the model, then these bugs will come again by removing "with torch.no_grad()".

Also, I agree with the top question, https://github.com/microsoft/scene_graph_benchmark/issues/25 Could you please provide a simpler way to extract the VinVL features directly? Because it will bring much help to the community, and we will cite your works definitely.

he159ok commented 3 years ago

Instead of running the test_sg_net successfully, I have run the demo_image.py successfully, which give some examples about how to use the VinVL.

The pytorch==1.4 and the output of nvcc -V >= 10.1 are both necessary, where mine output is 11.2. When I was pytorch==1.7, it fails by given invalid cuda error like https://github.com/microsoft/scene_graph_benchmark/issues/13.

Besides, once you have rebuilt a new environment, you need to run below to clean and rebuild the setup file,

rm -r build
python setup.py build develop

hanxiaotian commented 3 years ago

We just made an upgrade to pytorch 1.7. You can try 1.7 now. Thanks

microsoft / scene_graph_benchmark

Several issues in extracting VinVL Feature extraction #27