the build_dataset function in build_dataset.py file

zhangzhuoran997997 commented 3 years ago

Thanks for you work! I got some questions:

In this function( build_dataset ) , when calculating the o_blob , why use the torch.cat(o_blob,h_blob) to concatenate?
And what's the meaning and affect of the parameter max_h_as_o？

yeliudev commented 3 years ago

@zhangzhuoran997997 Thanks for your interest in our work!

In the fuction build_dataset, h_blob and o_blob store the bounding boxes, object detection scores, and other information about human (h) or object (o) instances. In HICO-DET, aside from human-object interactions, there are also human-human interactions. In this case, the latter human should be considered as 'object'. So that's why we concatenate objects with humans to obtain pair proposals for these interactions.
max_h_as_o means the maximum number of humans to be considered as objects in each image. If max_h_as_o > 0, only the top max_h_as_o humans with the highest object detection scores are concatenated with objects. This argument is set to -1 for the training set and 3 for the test set to reduce the number of pair proposals.

zhangzhuoran997997 commented 3 years ago

Really appreciate your answer!

I'm a beginner, so the problem will be a little simple.

I have other questions that:

When use the pre-trained detector gt_blobs, dt_blobs = model(return_loss=False, **data),I know the dt_blobs is consist of a tensor shaped ($$N_0$$ , 1110)
- eg. [bbox( 4 ) , classification( 80 ) , feature( 1024 ) , confidence score( 1 ) , human/object( 1 )]
And I know the gt_blobs should be 1024 dimension features of the human and objects without confidence score. But where did it from and what's the meaning of it ?
In addition,how can I view the detail of the model after build_detector function?
- eg.
  - the return value of model
  - the init def
  - the forward def

yeliudev commented 3 years ago

dt_blobs stores the information of humans and objects detected by the object detector, while gt_blobs stores the features of the ground truth bboxes from HICO-DET. The other information (bboxes, classes) about these instances are loaded from the annotation file (anno_bbox.mat).
The object detector we used inherits from mmdetection's TwoStageDetector. We add several modifications to make it possible to extract hidden features from it. For details of the detector, you may refer to this.

Please feel free to ask me again if you have any other questions :)

zhangzhuoran997997 commented 3 years ago

Thank you very much for the timely answers. Now, I have a comprehensive understanding of the code and each module, but there are still some questions:

In the semantic.py file , the init and train both use the _use_cache and _cache , when will this variable _use_cache become true? And how the train function work here?

def __init__(self, stream, dims, graph=None, **kwargs):
        super(SemanticBlock, self).__init__()
        self._stream = stream
        self._use_cache = False
        self._cache = None

def train(self, mode=True):
        self._use_cache = not mode
        self._cache = None
        return super(SemanticBlock, self).train(mode=mode)

Which parameters will be updated during training ?
- the MLPs of Mapper block and Fusion block
- the attention of GATs
- and more?
Is the affect of the convert_annotation fine-tuning the detector pre-trained on COCO?

yeliudev commented 3 years ago

self._use_cache is False in training mode and turns True in test mode (after you explicitly invoke model.eval()). This is because the input of SemanticBlock (GATs) are fixed (word embeddings from pretrained ELMo) so that its output can be cached for more efficient inference.
All the parameters (including Mapper block, Fusion block, and GATs) except for the ones belong to the object detector are updated during training.
convert_annotation is used to convert the annotations of HICO-DET to COCO format. We indeed used these converted annotations to finetune the detector using mmdet.

zhangzhuoran997997 commented 3 years ago

Really appreciate your answer!

I have a question that do I need to change a lot if I want training with multiple GPUs?

the function build_model in the builder.py

def build_model(cfg, **kwargs):
    model = build_object(cfg, MODELS, **kwargs)
    model = NNDataParallel(model, device_ids=[0])
    return model

I try to change the device_ids to device_ids=[0,1,2] only , but it raises an error when running launch.py

Traceback (most recent call last):
  File "/S1/MIPL/zhangzhuoran/code/present_work/ConsNet/ConsNet-main/tools/launch.py", line 72, in <module>
    main()
  File "/S1/MIPL/zhangzhuoran/code/present_work/ConsNet/ConsNet-main/tools/launch.py", line 68, in main
    engine.launch(eval_mode=args.eval)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 514, in launch
    self.run_stage()
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 458, in run_stage
    self.train_epoch()
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 409, in train_epoch
    self.train_iter(data)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 359, in train_iter
    output = self.model(data, mode=self._mode, **self._kwargs)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

But I'm sure that I have four GPUs

And when I added the code

def build_model(cfg, **kwargs):
    model = build_object(cfg, MODELS, **kwargs)
    model = NNDataParallel(model, device_ids=[0,1,2])
    device = torch.device("cuda:0" ) 
    model.to(device)
    return model

It raises another error when running launch.py

Traceback (most recent call last):
  File "/S1/MIPL/zhangzhuoran/code/present_work/ConsNet/ConsNet-main/tools/launch.py", line 72, in <module>
    main()
  File "/S1/MIPL/zhangzhuoran/code/present_work/ConsNet/ConsNet-main/tools/launch.py", line 68, in main
    engine.launch(eval_mode=args.eval)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 514, in launch
    self.run_stage()
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 458, in run_stage
    self.train_epoch()
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 409, in train_epoch
    self.train_iter(data)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/nncore/engine/engine.py", line 359, in train_iter
    output = self.model(data, mode=self._mode, **self._kwargs)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    return self.gather(outputs, self.output_device)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 180, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 76, in gather
    res = gather_map(outputs)
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in gather_map
    return type(out)(((k, gather_map([d[k] for d in outputs]))
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in <genexpr>
    return type(out)(((k, gather_map([d[k] for d in outputs]))
  File "/home/zhangzr/anaconda3/envs/consnet_4/lib/python3.9/site-packages/torch/nn/parallel/scatter_gather.py", line 71, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: 'int' object is not iterable

yeliudev commented 3 years ago

Sorry for my late reply. Currently, NNDataParallel only supports single GPU training. If you would like to train on multiple GPUs, you may use NNDistributedDataParallel, which is similar to DistributedDataParallel. Some minor changes on the code should also be made. However, there is no need to use multiple GPUs for this model. If you want to increase the batch size, you can simply change the values of batch_size in the config files.

yeliudev commented 3 years ago

I'm closing this issue. Please feel free to re-open it if you have any further questions.

yeliudev / ConsNet

the build_dataset function in build_dataset.py file #6