Open qin2294096 opened 3 years ago
Hi, thank you for interesting in our work.
Yes, your understanding is correct. The raw VFNet is the FCOS+ATSS without the centerness branch.
You can learn from our paper that the FCOS+ATSS with FL achieves 39.2 AP (Table 1) and raw VFNet + VFL 40.1 AP (Table 3).
Oh, that means we don't need to use centerness
in FCOS
when applying VFL
. So it is a very good loss function!
Thanks for your fast reply.
My pleasure.
Hi, thanks for your good work! Could you please provide demo code for the experiments in Table 1, e.g the AP 74.7 using gt_cls_iou?
Hi, the implementation is tricky and making it public needs a lot of work. I can tell you roughly how I achieved it and share some pieces of the code.
The main idea is using the training phase to get the ground-truth values.
Download the pre-trained model of FCOS and use it as the starting point of training.
Set the learning rate of all parameters of the model as 0.0 during training except a few unimportant parameters whose lr can be set as a very small value, say 0.0000001. This can make the training proceed with some paras being optimized. (A simple alternative would be setting the overall lr = 0.0). In effect, training with lr=0.0 will not change the paras of the model and also the performance. This step can be done in the train.py of MMDetection.
Use the val2007 split as the training set in the config file.
Do some work in loss computation function of fcos_head. Here I provide my example code which I implemented in MMDetection v1.1.0 only for your understanding.
# # replace cls_scores with real IoUs to see the performance
a_ious = bbox_overlaps(pos_decoded_bbox_preds.detach(), pos_decoded_target_preds.detach(),
is_aligned=True).clamp(min=1e-6)
a_ious_logits = torch.log(a_ious/(1.0 - a_ious))
a_label_targets = flatten_labels[pos_inds]
flatten_cls_scores_new = flatten_cls_scores.detach().clone()
flatten_cls_scores_new[pos_inds, a_label_targets-1] = a_ious_logits
# flatten_bbox_preds_new = flatten_bbox_preds.detach().clone()
# flatten_bbox_preds_new[pos_inds] = pos_bbox_targets
# flatten_centerness_new = flatten_centerness.detach().clone()
# # global replacement
# flatten_centerness_new[:] = torch.log(flatten_centerness_new.new_tensor(0.000000001))
# flatten_centerness_new[pos_inds] = torch.log(pos_centerness_targets /
# (1.0 - pos_centerness_targets + 0.000001))
# # the number of points per img, per lvl
num_points = [points.size(0) for points in all_level_points]
cls_score_list = list(flatten_cls_scores_new.detach().split(num_points, 0))
bbox_pred_list = list(flatten_bbox_preds.detach().split(num_points, 0))
centerness_list = list(flatten_centerness.detach().split(num_points, 0))
img_shape = img_metas[0]['img_shape']
scale_factor = img_metas[0]['scale_factor']
cfg.score_thr = 0.05
cfg.nms = dict(type='nms', iou_thr=0.5)
cfg.max_per_img = 100
result_list = []
det_results = self.get_bboxes_single_analysis(cls_score_list, bbox_pred_list, centerness_list,
all_level_points, img_shape, scale_factor, cfg)
result_list.append(det_results)
bbox_results = [
bbox2result(det_bboxes, det_labels, self.num_classes)
for det_bboxes, det_labels in result_list
]
img_name = img_metas[0]['filename'].split('/')[-1][:-4]
save_name = '~/mmdet/work_dirs/fcos_analysis_gt_cls_iou/' + img_name + '.pt'
with open(save_name, 'wb') as f:
pickle.dump(bbox_results, f)
def get_bboxes_single_analysis(self,
cls_scores,
bbox_preds,
centernesses,
mlvl_points,
img_shape,
scale_factor,
cfg,
rescale=True):
assert len(cls_scores) == len(bbox_preds) == len(mlvl_points)
mlvl_bboxes = []
mlvl_scores = []
mlvl_centerness = []
for cls_score, bbox_pred, centerness, points in zip(
cls_scores, bbox_preds, centernesses, mlvl_points):
assert cls_score.size(0) == bbox_pred.size(0)
scores = cls_score.detach().sigmoid()
centerness = centerness.detach().sigmoid()
bbox_pred = bbox_pred.detach()
nms_pre = cfg.get('nms_pre', -1)
if nms_pre > 0 and scores.shape[0] > nms_pre:
max_scores, _ = (scores * centerness[:, None]).max(dim=1)
# max_scores, _ = scores.max(dim=1)
_, topk_inds = max_scores.topk(nms_pre)
points = points[topk_inds, :]
bbox_pred = bbox_pred[topk_inds, :]
scores = scores[topk_inds, :]
centerness = centerness[topk_inds]
bboxes = distance2bbox(points, bbox_pred, max_shape=img_shape)
mlvl_bboxes.append(bboxes)
mlvl_scores.append(scores)
mlvl_centerness.append(centerness)
mlvl_bboxes = torch.cat(mlvl_bboxes)
if rescale:
mlvl_bboxes /= mlvl_bboxes.new_tensor(scale_factor)
mlvl_scores = torch.cat(mlvl_scores)
padding = mlvl_scores.new_zeros(mlvl_scores.shape[0], 1)
mlvl_scores = torch.cat([padding, mlvl_scores], dim=1)
mlvl_centerness = torch.cat(mlvl_centerness)
det_bboxes, det_labels, det_probs = multiclass_nms(
mlvl_bboxes,
mlvl_scores,
cfg.score_thr,
cfg.nms,
cfg.max_per_img,
score_factors=mlvl_centerness)
return det_bboxes, det_labels
Merge those per-image results and convert into the format that the evaluation code requires.
Finally, evaluate it!
Hope this helps and good luck.
Hi @hyz-xmaster , for comparisons in table5, does the VFL mean the loss in equation 2 or means that loss in equation 2 + Star Dconv + BBox refinement?
Hi @feiyuhuahuo, in Table 5, VFL means only the loss. In the last row, VFNet + VFL = VFL loss + Star DConv + BBox refinement, and similarly VFNet + FL = FL loss + Star DConv + BBox refinement.
@hyz-xmaster Hi,
I'm confused about the comparison in Table 1 and Table 3.
Is the first row raw VFL (39.0AP)
in Table 3 and the first column FCOS + ATSS w/o ctr (38.5AP)
in Table 1 the same?
Hi @youngwanLEE, raw VFL (39.0 AP) in Table 3 has the same structure with FCOS+ATSS without centerness branch and it is retrained. FCOS + ATSS w/o ctr (38.5 AP) in Table 1 means FCOS + ATSS is trained with the centerness branch but centerness scores are not used in inference. So the difference is if the centerness branch is used in training.
@hyz-xmaster , hi, i wonder why you use log function for iou_logits and centerness when replacing the value with ground-truth targets in example code.
Hi @yingyu13 , log(x / (1 - x))
is the inverse function of sigmoid
.
Thanks! I got it.
Hi, the implementation is tricky and making it public needs a lot of work. I can tell you roughly how I achieved it and share some pieces of the code.
The main idea is using the training phase to get the ground-truth values.
- Download the pre-trained model of FCOS and use it as the starting point of training.
- Set the learning rate of all parameters of the model as 0.0 during training except a few unimportant parameters whose lr can be set as a very small value, say 0.0000001. This can make the training proceed with some paras being optimized. (A simple alternative would be setting the overall lr = 0.0). In effect, training with lr=0.0 will not change the paras of the model and also the performance. This step can be done in the train.py of MMDetection.
- Use the val2007 split as the training set in the config file.
- Do some work in loss computation function of fcos_head. Here I provide my example code which I implemented in MMDetection v1.1.0 only for your understanding.
# # replace cls_scores with real IoUs to see the performance a_ious = bbox_overlaps(pos_decoded_bbox_preds.detach(), pos_decoded_target_preds.detach(), is_aligned=True).clamp(min=1e-6) a_ious_logits = torch.log(a_ious/(1.0 - a_ious)) a_label_targets = flatten_labels[pos_inds] flatten_cls_scores_new = flatten_cls_scores.detach().clone() flatten_cls_scores_new[pos_inds, a_label_targets-1] = a_ious_logits # flatten_bbox_preds_new = flatten_bbox_preds.detach().clone() # flatten_bbox_preds_new[pos_inds] = pos_bbox_targets # flatten_centerness_new = flatten_centerness.detach().clone() # # global replacement # flatten_centerness_new[:] = torch.log(flatten_centerness_new.new_tensor(0.000000001)) # flatten_centerness_new[pos_inds] = torch.log(pos_centerness_targets / # (1.0 - pos_centerness_targets + 0.000001)) # # the number of points per img, per lvl num_points = [points.size(0) for points in all_level_points] cls_score_list = list(flatten_cls_scores_new.detach().split(num_points, 0)) bbox_pred_list = list(flatten_bbox_preds.detach().split(num_points, 0)) centerness_list = list(flatten_centerness.detach().split(num_points, 0)) img_shape = img_metas[0]['img_shape'] scale_factor = img_metas[0]['scale_factor'] cfg.score_thr = 0.05 cfg.nms = dict(type='nms', iou_thr=0.5) cfg.max_per_img = 100 result_list = [] det_results = self.get_bboxes_single_analysis(cls_score_list, bbox_pred_list, centerness_list, all_level_points, img_shape, scale_factor, cfg) result_list.append(det_results) bbox_results = [ bbox2result(det_bboxes, det_labels, self.num_classes) for det_bboxes, det_labels in result_list ] img_name = img_metas[0]['filename'].split('/')[-1][:-4] save_name = '~/mmdet/work_dirs/fcos_analysis_gt_cls_iou/' + img_name + '.pt' with open(save_name, 'wb') as f: pickle.dump(bbox_results, f)
def get_bboxes_single_analysis(self, cls_scores, bbox_preds, centernesses, mlvl_points, img_shape, scale_factor, cfg, rescale=True): assert len(cls_scores) == len(bbox_preds) == len(mlvl_points) mlvl_bboxes = [] mlvl_scores = [] mlvl_centerness = [] for cls_score, bbox_pred, centerness, points in zip( cls_scores, bbox_preds, centernesses, mlvl_points): assert cls_score.size(0) == bbox_pred.size(0) scores = cls_score.detach().sigmoid() centerness = centerness.detach().sigmoid() bbox_pred = bbox_pred.detach() nms_pre = cfg.get('nms_pre', -1) if nms_pre > 0 and scores.shape[0] > nms_pre: max_scores, _ = (scores * centerness[:, None]).max(dim=1) # max_scores, _ = scores.max(dim=1) _, topk_inds = max_scores.topk(nms_pre) points = points[topk_inds, :] bbox_pred = bbox_pred[topk_inds, :] scores = scores[topk_inds, :] centerness = centerness[topk_inds] bboxes = distance2bbox(points, bbox_pred, max_shape=img_shape) mlvl_bboxes.append(bboxes) mlvl_scores.append(scores) mlvl_centerness.append(centerness) mlvl_bboxes = torch.cat(mlvl_bboxes) if rescale: mlvl_bboxes /= mlvl_bboxes.new_tensor(scale_factor) mlvl_scores = torch.cat(mlvl_scores) padding = mlvl_scores.new_zeros(mlvl_scores.shape[0], 1) mlvl_scores = torch.cat([padding, mlvl_scores], dim=1) mlvl_centerness = torch.cat(mlvl_centerness) det_bboxes, det_labels, det_probs = multiclass_nms( mlvl_bboxes, mlvl_scores, cfg.score_thr, cfg.nms, cfg.max_per_img, score_factors=mlvl_centerness) return det_bboxes, det_labels
- Merge those per-image results and convert into the format that the evaluation code requires.
- Finally, evaluate it!
Hope this helps and good luck.
How to merge those per-image results and convert into the format that the evaluation code requires? How to align each per-image results with corresponding gt bboxes? I have successfully reimplement the stage 1-4, but get stacked at stage 5. Could you provide corresponding code of stage 5?
How to merge those per-image results and convert into the format that the evaluation code requires? How to align each per-image results with corresponding gt bboxes? I have successfully reimplement the stage 1-4, but get stacked at stage 5. Could you provide corresponding code of stage 5?
Hi, this is the code for my experiment which was done with mmdetection v1.1. You may write your own based on it. Hope this helps.
import numpy as np
from pycocotools.coco import COCO
import os
import pickle
import torch
import json
import matplotlib.pyplot as plt
import numpy as np
coco_gt_file = './mmdet/data/coco/annotations/instances_val2017.json'
coco = COCO(coco_gt_file)
res_dir = './mmdet/work_dirs/fcos_analysis/fcos_atss_analysis_gt_iou_cls_score/'
img_ids = sorted(coco.getImgIds())
results = []
for img_id in img_ids:
res_name = os.path.join(res_dir, coco.imgs[img_id]['file_name'][:-4]+'.pt')
if not os.path.exists(res_name):
results.append([np.empty((0, 5)) for _ in range(80)])
continue
with open(res_name, 'rb') as fp:
res = pickle.load(fp)
results.append(res[0])
from mmdet.datasets import build_dataset
import mmcv
config_file = './mmdet/configs/fcos/fcos_analysis_r50_caffe_fpn_gn_1x_4gpu.py'
cfg = mmcv.Config.fromfile(config_file)
cfg.data.test.test_mode = True
dataset = build_dataset(cfg.data.test)
dataset.evaluate(results)
How to merge those per-image results and convert into the format that the evaluation code requires? How to align each per-image results with corresponding gt bboxes? I have successfully reimplement the stage 1-4, but get stacked at stage 5. Could you provide corresponding code of stage 5?
Hi, this is the code for my experiment which was done with mmdetection v1.1. You may write your own based on it. Hope this helps.
import numpy as np from pycocotools.coco import COCO import os import pickle import torch import json import matplotlib.pyplot as plt import numpy as np coco_gt_file = './mmdet/data/coco/annotations/instances_val2017.json' coco = COCO(coco_gt_file) res_dir = './mmdet/work_dirs/fcos_analysis/fcos_atss_analysis_gt_iou_cls_score/' img_ids = sorted(coco.getImgIds()) results = [] for img_id in img_ids: res_name = os.path.join(res_dir, coco.imgs[img_id]['file_name'][:-4]+'.pt') if not os.path.exists(res_name): results.append([np.empty((0, 5)) for _ in range(80)]) continue with open(res_name, 'rb') as fp: res = pickle.load(fp) results.append(res[0]) from mmdet.datasets import build_dataset import mmcv config_file = './mmdet/configs/fcos/fcos_analysis_r50_caffe_fpn_gn_1x_4gpu.py' cfg = mmcv.Config.fromfile(config_file) cfg.data.test.test_mode = True dataset = build_dataset(cfg.data.test) dataset.evaluate(results)
Thanks!
How to merge those per-image results and convert into the format that the evaluation code requires? How to align each per-image results with corresponding gt bboxes? I have successfully reimplement the stage 1-4, but get stacked at stage 5. Could you provide corresponding code of stage 5?
Hi, this is the code for my experiment which was done with mmdetection v1.1. You may write your own based on it. Hope this helps.
import numpy as np from pycocotools.coco import COCO import os import pickle import torch import json import matplotlib.pyplot as plt import numpy as np coco_gt_file = './mmdet/data/coco/annotations/instances_val2017.json' coco = COCO(coco_gt_file) res_dir = './mmdet/work_dirs/fcos_analysis/fcos_atss_analysis_gt_iou_cls_score/' img_ids = sorted(coco.getImgIds()) results = [] for img_id in img_ids: res_name = os.path.join(res_dir, coco.imgs[img_id]['file_name'][:-4]+'.pt') if not os.path.exists(res_name): results.append([np.empty((0, 5)) for _ in range(80)]) continue with open(res_name, 'rb') as fp: res = pickle.load(fp) results.append(res[0]) from mmdet.datasets import build_dataset import mmcv config_file = './mmdet/configs/fcos/fcos_analysis_r50_caffe_fpn_gn_1x_4gpu.py' cfg = mmcv.Config.fromfile(config_file) cfg.data.test.test_mode = True dataset = build_dataset(cfg.data.test) dataset.evaluate(results)
Why the result is arranged by img_id (I mean img_ids = sorted(coco.getImgIds()))? Is it because the annotation info from dataset = build_dataset(cfg.data.test) is arranged by img_id?
Why the result is arranged by img_id (I mean img_ids = sorted(coco.getImgIds()))? Is it because the annotation info from dataset = build_dataset(cfg.data.test) is arranged by img_id?
To make the order of the result consistent with the order of ground truth used the evaluation code, I think you also need to add one similar line: img_ids = sorted(coco.getImgIds())
to the COCO api used by the evaluation code.
Why the result is arranged by img_id (I mean img_ids = sorted(coco.getImgIds()))? Is it because the annotation info from dataset = build_dataset(cfg.data.test) is arranged by img_id?
To make the order of the result consistent with the order of ground truth used the evaluation code, I think you also need to add one similar line:
img_ids = sorted(coco.getImgIds())
to the COCO api used by the evaluation code. All previous problems have been sloved. However, I don't get expected results. Actually, I do experiment on RetinaNet instead of FCOS. I find that, even I don't make any changes to the outpus, the evaluate results is much worse than the testing resuls of pretrained model. I'm wondering if it is the cause of transforms( I mean the transforms in training_pipeline are different from testing_pipeline). Do you change the tranforms of training_pipeline to be the same as that of testing_pipeline? I just use the training_pipeline.train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ]
There is no MultiScaleFlipAug in training_pipeline. My results of resnet_50_fpn_1x retinanet are below, I just use the outputs in loss() function to evaluate and didn' t make any change:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.141 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.242 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.137 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.138 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.200 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.188 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.298 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.317 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.306 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.451
Raw AP(0.5:0.95)=0.359
Do you change the tranforms of training_pipeline to be the same as that of testing_pipeline? I just use the training_pipeline.
No, I didn't change these configurations. Given your results, I guess there might be something wrong in some step of your experiment.
Do you change the tranforms of training_pipeline to be the same as that of testing_pipeline? I just use the training_pipeline.
No, I didn't change these configurations. Given your results, I guess there might be something wrong in some step of your experiment.
Maybe. But I only make some small changes like below.
in anchor_head.py
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
gt_labels,
img_metas,
gt_bboxes_ignore=None,
**kwargs):
self.save_bboxes_batch(cls_scores, bbox_preds, img_metas)
...
#retinaNet上限实验
def save_bboxes_batch(self, cls_scores, bbox_preds, img_metas):
results_list = self.get_bboxes(cls_scores, bbox_preds, img_metas, rescale=True)
from mmdet.core import bbox2result
#每个类的结果
bbox_results = [
bbox2result(det_bboxes, det_labels, self.num_classes)
for det_bboxes, det_labels in results_list
]
img_name = img_metas[0]['filename'].split('/')[-1][:-4]
save_name = '/gpfs/home/sist/tqzouustc/code/mmdetection/retinanet_analysis_wo_change_noflip/' + img_name + '.pt'
import os
if not os.path.exists(os.path.dirname(save_name)):
os.mkdir(os.path.dirname(save_name))
import pickle
with open(save_name, 'wb') as f:
pickle.dump(bbox_results, f)
I don't change cls_scores and bbox_preds. And the self.get_bboxes function is the original function in anchor_head.py. I don't change anything of it. Also I don't change other parts of loss() function. So what is wrong in my pipelines? I get stucked here for several days. It makes me feel really upset...
I don't change cls_scores and bbox_preds. And the self.get_bboxes function is the original function in anchor_head.py. I don't change anything of it. Also I don't change other parts of loss() function. So what is wrong in my pipelines? I get stucked here for several days. It makes me feel really upset..
I can't figure out exactly what the wrong step is in your code, but I have a wild guess that you should not use the self.get_bboxes
function. Instead you should use the self._get_bboxes
one and make some changes to it, since in my code I use theself.get_bbox_single
(its name has been changed to self._get_bboxes
in current version of mmdet) not the self.get_bboxes
. I can't remember the reason why I didn't use the self.get_bboxes
though.
@Icecream-blue-sky, sorry, it suddenly came to my mind that you need to change this line: dict(type='RandomFlip', flip_ratio=0.5)
to dict(type='RandomFlip', flip_ratio=0.0)
so that the image is not flipped.
@Icecream-blue-sky, sorry, it suddenly came to my mind that you need to change this line:
dict(type='RandomFlip', flip_ratio=0.5)
todict(type='RandomFlip', flip_ratio=0.0)
so that the image is not flipped.
Thanks!
@Icecream-blue-sky, sorry, it suddenly came to my mind that you need to change this line:
dict(type='RandomFlip', flip_ratio=0.5)
todict(type='RandomFlip', flip_ratio=0.0)
so that the image is not flipped.
You are right! It seems that the bug have been fixed!!!! Thanks for your kind help!!!
@Icecream-blue-sky, sorry, it suddenly came to my mind that you need to change this line:
dict(type='RandomFlip', flip_ratio=0.5)
todict(type='RandomFlip', flip_ratio=0.0)
so that the image is not flipped.You are right! It seems that the bug have been fixed!!!! Thanks for your kind help!!!
Great to hear that. My pleasure.
Hi, thanks for your work and repo. I'm very interested in the
VFL
, which combines classification scores and location scores in the targets. Then I have some questions aboutVFL
.table 3
of the paper, the first row represents the results of the raw VFNet trained with the focal loss. What israw VFNet
? Is itFCOS+ATSS with the centerness branch removed
?VFL
toFCOS+ATSS with the centerness branch removed
and applyingFL
toFCOS+ATSS(with the centerness branch)
?Thank you very much!