Report metrics `motif` and `IMP`

jnhwkim commented 5 years ago

@jwyang

I saw that the evaluation report includes motif and IMP labeled scores. What does it mean? motif scores were slightly higher than IMP's.
The scores reported in README.md are which ones, motif or IMP?

jwyang commented 5 years ago

Hi, @jnhwkim these two metrics are those used in iterative message passing the neural motif network. There are differences between them, but usually very slight.

jnhwkim commented 5 years ago

@jwyang Thanks for the confirmation. Here is my step-training outcome for sg_imp with the updated hyperparameters:

SOLVER:
  BASE_LR: 5e-3
  MAX_ITER: 15000
  STEPS: (8000,12000)

2019-08-28 00:51:54,767 scene_graph_generation.inference INFO: Total run time: 0:27:00.385677 (0.36762890644083585 s / img per device, on 6 devices)
2019-08-28 00:51:54,768 scene_graph_generation.inference INFO: Model inference time: 0:00:00 (0.0 s / img per device, on 6 devices)
creating index...
index created!
Loading and preparing results...
DONE (t=7.74s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=214.70s).
Accumulating evaluation results...
DONE (t=58.65s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.123
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.242
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.054
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.141
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.198
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.273
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.016
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.164
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.299
2019-08-28 00:59:59,476 scene_graph_generation.inference INFO: performing scene graph evaluation.
2019-08-28 01:05:38,783 scene_graph_generation.inference INFO: ===================sgdet(motif)=========================
2019-08-28 01:05:38,787 scene_graph_generation.inference INFO: sgdet-recall@20: 0.176840
2019-08-28 01:05:38,788 scene_graph_generation.inference INFO: sgdet-recall@50: 0.230771
2019-08-28 01:05:38,789 scene_graph_generation.inference INFO: sgdet-recall@100: 0.265008
2019-08-28 01:05:38,789 scene_graph_generation.inference INFO: =====================sgdet(IMP)=========================
2019-08-28 01:05:38,790 scene_graph_generation.inference INFO: sgdet-recall@20: 0.1697465185413381
2019-08-28 01:05:38,791 scene_graph_generation.inference INFO: sgdet-recall@50: 0.22333252969528503
2019-08-28 01:05:38,792 scene_graph_generation.inference INFO: sgdet-recall@100: 0.25760919387128584

Presumably, did you use IMP scores for IMP model in README.md?

jwyang commented 5 years ago

I remember I used motif score for all the reported number in README.md. I reported the numbers at 40k iteration, and found it is a bit overfitting.

jnhwkim commented 5 years ago

Then, I got slightly better results with the updated hyperparameters.

jgyy4775 commented 5 years ago

@jnhwkim what checkpoints do you use? And what command do you use?

jnhwkim commented 5 years ago

@jgyy4775 run with python -m torch.distributed.launch --nproc_per_node=8 main.py --config-file configs/sgg_res101_step.yaml --algorithm sg_imp and yaml:

DATASET:
  NAME: "vg"
  MODE: "benchmark"
  TRAIN_BATCH_SIZE: 8
  TEST_BATCH_SIZE: 1
MODEL:
  WEIGHT_IMG: "catalog://ImageNetPretrained/MSRA/R-101"
  WEIGHT_DET: "checkpoints/vg_benchmark_object/R-101-C4/faster_rcnn/BatchSize_6/Base_LR_0.005/checkpoint_0099999.pth"
  RELATION_ON: True
  ALGORITHM: "sg_baseline"
  USE_FREQ_PRIOR: False
  BACKBONE:
    CONV_BODY: "R-101-C4"
    FREEZE_PARAMETER: True
  RPN:
    FREEZE_PARAMETER: True
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 256
    DETECTIONS_PER_IMG: 64
  ROI_BOX_HEAD:
    NUM_CLASSES: 151
    FREEZE_PARAMETER: True
  ROI_RELATION_HEAD:
    BATCH_SIZE_PER_IMAGE: 256
    NUM_CLASSES: 51
    IMP_FEATURE_UPDATE_STEP: 2
    MSDN_FEATURE_UPDATE_STEP: 2
    GRCNN_FEATURE_UPDATE_STEP: 2
SOLVER:
  BASE_LR: 5e-3
  MAX_ITER: 15000
  STEPS: (8000,12000)
  CHECKPOINT_PERIOD: 1000

WEIGHT_DET points to the pretrained detector.

jgyy4775 commented 5 years ago

thank you!

jungjun9150 commented 5 years ago

@jnhwkim To train the graph-rcnn, command: python -m torch.distributed.launch --nproc_per_node={NGPU} main.py --config-file configs/sgg_res101_step.yaml --algorithm sg_grcnn

yaml: DATASET: NAME: "vg" MODE: "benchmark" TRAIN_BATCH_SIZE: 8 TEST_BATCH_SIZE: 1 MODEL: WEIGHT_IMG: "catalog://ImageNetPretrained/MSRA/R-101" WEIGHT_DET: "checkpoints/vg_benchmark_object/R-101-C4/faster_rcnn/BatchSize_6/Base_LR_0.005/checkpoint_0099999.pth" RELATION_ON: True ALGORITHM: "sg_grcnn" USE_FREQ_PRIOR: False BACKBONE: CONV_BODY: "R-101-C4" FREEZE_PARAMETER: True RPN: FREEZE_PARAMETER: True ROI_HEADS: BATCH_SIZE_PER_IMAGE: 256 DETECTIONS_PER_IMG: 64 ROI_BOX_HEAD: NUM_CLASSES: 151 FREEZE_PARAMETER: True ROI_RELATION_HEAD: BATCH_SIZE_PER_IMAGE: 256 NUM_CLASSES: 51 IMP_FEATURE_UPDATE_STEP: 2 MSDN_FEATURE_UPDATE_STEP: 2 GRCNN_FEATURE_UPDATE_STEP: 2 SOLVER: BASE_LR: 5e-3 MAX_ITER: 15000 STEPS: (8000,12000) CHECKPOINT_PERIOD: 1000

Is the above modification correct?

jnhwkim commented 5 years ago

@jungjun9150 Sorry but graph-rcnn is not tested by myself, yet. @jwyang would want to answer to this question.

jgyy4775 commented 5 years ago

@jnhwkim When you training, have you seen this error?

"RuntimeError: shape '[-1, 604]' is invalid for input of size 4"

jnhwkim commented 5 years ago

@jgyy4775 Sorry, I don't have any clue.

jwyang / graph-rcnn.pytorch

Report metrics `motif` and `IMP` #56