只有一类目标的时候报错，训练不起来

wuzuowuyou commented 4 years ago

refinedet_multibox_loss.py 第 78行：

    for idx in range(num):
        truths = targets[idx][:, :-1].data
        labels = targets[idx][:, -1].data
        if num_classes == 2:
            labels = labels >= 0

这里为啥修正 if num_classes == 2的情况，导致后面的 refine_match(self.threshold, truths, defaults, self.variance, labels, loc_t, conf_t, idx, arm_loc_data[idx].data)函数里面的 conf = labels[best_truth_idx] + 1 # Shape: [num_priors] 越界。变为2越界

wuzuowuyou commented 4 years ago

改正一下，是导致这里越界： loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

cenchaojun commented 4 years ago

改正一下，是导致这里越界： loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

那是要把这个删除掉嘛

lzk901372 commented 3 years ago

已经发现问题在哪里了，我也是仔细读代码才找出来的。问题应该是由：①layers/modules/refinedet_multibox_loss.py；②layers/box_utils.py里部分代码共同造成的，而最直接的问题在box_utils.py里的这一处代码：

if arm_loc is None:
    #....
else:
    conf = labels[best_truth_idx] + 1 #问题在这里
    #....

这个地方会给分类标号加1，而最终导致后面的gather操作下标越界。我的修改方法是：

对于layers/modules/refinedet_multibox_loss.py：将以下代码：

        if self.use_ARM:
            refine_match(self.threshold, truths, defaults, self.variance, labels,
                        loc_t, conf_t, idx, self.use_ARM)
        else:
            refine_match(self.threshold, truths, defaults, self.variance, labels,
                        loc_t, conf_t, idx)

改成：

    if self.use_ARM:
        refine_match(self.threshold, truths, defaults, self.variance, labels,
                    loc_t, conf_t, idx, self.use_ARM, num_classes, arm_loc_data[idx].data)
    else:
        refine_match(self.threshold, truths, defaults, self.variance, labels,
                    loc_t, conf_t, idx, self.use_ARM, num_classes)

对于layers/box_utils.py：将以下代码：

def refine_match(threshold, truths, priors, variances,
                labels, loc_t, conf_t, idx, arm_loc=None):
   #....省略多行代码
       else:
           conf = labels[best_truth_idx] + 1
   #....后面的不用管

改成：

def refine_match(threshold, truths, priors, variances,
                labels, loc_t, conf_t, idx, use_arm, num_classes,
                arm_loc=None):
   #....省略多行代码
       else:
           conf = labels[best_truth_idx] if use_arm and num_classes == 2 \
               else labels[best_truth_idx] + 1
   #....后面的不用管

按照这样的修改方式修改后，程序可以正常跑通（前提是对于单类，之前对程序其他地方的修改是正确的）。初步判定是源代码中对单类目标分类与检测的支持有一些问题，进而导致了在单类目标检测任务上出现下标越界的问题。

如有不同意见看法，欢迎批评指正！

lzk901372 commented 3 years ago

已经发现问题在哪里了，我也是仔细读代码才找出来的。问题应该是由：①layers/modules/refinedet_multibox_loss.py；②layers/box_utils.py里部分代码共同造成的，而最直接的问题在box_utils.py里的这一处代码：
if arm_loc is None:
    #....
else:
    conf = labels[best_truth_idx] + 1 #问题在这里
    #....
这个地方会给分类标号加1，而最终导致后面的gather操作下标越界。我的修改方法是：

对于layers/modules/refinedet_multibox_loss.py：将以下代码：
            if self.use_ARM:
                refine_match(self.threshold, truths, defaults, self.variance, labels,
                            loc_t, conf_t, idx, self.use_ARM)
            else:
                refine_match(self.threshold, truths, defaults, self.variance, labels,
                            loc_t, conf_t, idx)
改成：
        if self.use_ARM:
            refine_match(self.threshold, truths, defaults, self.variance, labels,
                        loc_t, conf_t, idx, self.use_ARM, num_classes, arm_loc_data[idx].data)
        else:
            refine_match(self.threshold, truths, defaults, self.variance, labels,
                        loc_t, conf_t, idx, self.use_ARM, num_classes)
对于layers/box_utils.py：将以下代码：
def refine_match(threshold, truths, priors, variances,
                 labels, loc_t, conf_t, idx, arm_loc=None):
    #....省略多行代码
        else:
            conf = labels[best_truth_idx] + 1
    #....后面的不用管
改成：
def refine_match(threshold, truths, priors, variances,
                 labels, loc_t, conf_t, idx, use_arm, num_classes,
                 arm_loc=None):
    #....省略多行代码
        else:
            conf = labels[best_truth_idx] if use_arm and num_classes == 2 \
                else labels[best_truth_idx] + 1
    #....后面的不用管
按照这样的修改方式修改后，程序可以正常跑通（前提是对于单类，之前对程序其他地方的修改是正确的）。初步判定是源代码中对单类目标分类与检测的支持有一些问题，进而导致了在单类目标检测任务上出现下标越界的问题。

如有不同意见看法，欢迎批评指正！

English version

Kinda figured out where this problem might derive. It's probably coming from these two scripts: (i)layers/modules/refinedet_multibox_loss.py and (ii)layers/box_utils.py. In my perspective, these two scripts are the root of the problem why the training process could not run as expected when the class number is set to 2, including the background. Especially these codes below stands out as the most problematic part:

if arm_loc is None:
    #....
else:
    conf = labels[best_truth_idx] + 1 #Here's the problem
    #....

That line above will automatically add value 1 to each class confidence label, which consequently leads to the indexes out of their boundary. As a result, my suggestion of revision is as follows:

As for layers/modules/refinedet_multibox_loss.py, change these codes:

if self.use_ARM:
    refine_match(self.threshold, truths, defaults, self.variance, labels,
            loc_t, conf_t, idx, self.use_ARM)
else:
    refine_match(self.threshold, truths, defaults, self.variance, labels,
            loc_t, conf_t, idx)

to these codes:

if self.use_ARM:
        refine_match(self.threshold, truths, defaults, self.variance, labels,
                    loc_t, conf_t, idx, self.use_ARM, num_classes, arm_loc_data[idx].data)
    else:
        refine_match(self.threshold, truths, defaults, self.variance, labels,
                    loc_t, conf_t, idx, self.use_ARM, num_classes)

As for layers/box_utils.py, change these codes:

def refine_match(threshold, truths, priors, variances,
             labels, loc_t, conf_t, idx, arm_loc=None):
    #....Never mind
    else:
        conf = labels[best_truth_idx] + 1
    #....Never mind

to these codes:

def refine_match(threshold, truths, priors, variances,
             labels, loc_t, conf_t, idx, use_arm, num_classes,
             arm_loc=None):
    #....Never mind
    else:
        conf = labels[best_truth_idx] if use_arm and num_classes == 2 \
            else labels[best_truth_idx] + 1
    #....Never mind

As long as you follow this instruction, you can run your training code as you expect to. However, you should have already modified the former codes properly such as VOC_CLASSES. My initial thought about this problem is that the original code, which is partially taken from SSD model, has some basic errors when it comes to the support of object detection or classification of one single specified class, and that brings about the later issues, which is indexes out of boundary.

If you have different opinions on this problem, please share your ideas to help everybody better understand the codes. Thank you.

luuuyi / RefineDet.PyTorch

只有一类目标的时候报错，训练不起来 #45

English version