clw5180 commented 5 years ago

具体情况如下： loss: nan (nan) loss_classifier: 0.2950 (1.0475) loss_box_reg: 0.0002 (0.0087) loss_objectness: nan (nan) loss_rpn_box_reg: nan (nan)

求各位大佬指点迷津，感激不尽！ @mjq11302010044

下面是我使用的bbox格式：

boxes.append([x_ctr, y_ctr, width, height, angle, words])
分别为x中心坐标，y中心坐标，宽、高，角度，类别，不知是否是这种坐标形式

im_info格式：

        im_info = {
            'gt_classes': gt_classes,
            'max_classes': max_classes,
            'image': im_path,
            'boxes': gt_boxes,
            'flipped': False,
            'gt_overlaps': overlaps,
            'seg_areas': seg_areas,
            'height': im.shape[0],
            'width': im.shape[1],
            'max_overlaps': max_overlaps,
            'rotated': True
        }

还有DOTA的类别：

    cls_list = \
        {
            'background': 0,
            'roundabout': 1,
            'tennis-court': 2,
            'swimming-pool': 3,
            'storage-tank': 4,
            'soccer-ball-field': 5,
            'small-vehicle': 6,
            'ship': 7,
            'plane': 8,
            'large-vehicle': 9,
            'helicopter': 10,
            'harbor': 11,
            'ground-track-field': 12,
            'bridge': 13,
            'basketball-court': 14,
            'baseball-diamond': 15,
            'helipad': 16,
            'airport': 17,
            'container-crane': 18
        }

DATASET = {
    'IC13':get_ICDAR2013,
    'IC15':get_ICDAR2015_RRC_PICK_TRAIN,
    'IC17mlt':get_ICDAR2017_mlt,
    'LSVT':get_ICDAR_LSVT_full,
    'ArT':get_ICDAR_ArT,
    'ReCTs':get_ICDAR_ReCTs_full,
    'DOTA':get_DOTA,   # clw modify
}

_DEBUG = False
class RotationDataset(torch.utils.data.Dataset):
    CLASSES = (
        "__background__ ",      #"background",
        "roundabout",
        "tennis-court",
        "swimming-pool",
        "storage-tank",
        "soccer-ball-field",
        "small-vehicle",
        "ship",
        "plane",
        "large-vehicle",
        "helicopter",
        "harbor",
        "ground-track-field",
        "bridge",
        "basketball-court",
        "baseball-diamond",
        "helipad",
        "airport",
        "container-crane"
    )

mjq11302010044 commented 5 years ago

@clw5180 Go check if you having a consistent class number in your .yml file. :)

clw5180 commented 5 years ago

@clw5180 Go check if you having a consistent class number in your .yml file. :)

Thanks a lot! It's a problem of pytorch/torchvision's version and I try lots of times, torchvision=0.2.1 and pytorch=1.1 finally works.... also I have a question, what does '__C.MODEL.ROI_REC_HEAD.NUMCLASSES = 99' mean ? If I have 18 classes + 1 background, how to set this parameter ? 这个参数是什么含义，需要根据自己数据集的物体类别数量进行改动么，非常感谢 @mjq11302010044

Baby47 commented 5 years ago

@clw5180,@mjq11302010044 ， I start training on my dataset and it runned in a right way before iter 690, and after that it make errors like this. Have you ever encountered this and may be nan error is related to my problem, can you leave your email address to have further communication?

oceanleftsea commented 5 years ago

I also start training on dota dataset, and I find it works normally before 230 iters,but it appears nan until 230 iters.It shows error that RuntimeWarning: invalid value encountered in greater overlaps[overlaps > 1.00000001] = 0.0

clw5180 commented 5 years ago

overlaps > 1.00000001

解决办法： 1、删除小于16x16（或者保险起见8x8）的bbox； 2、在代码中找到T.RandomRotation，注释掉。

I also start training on dota dataset, and I find it works normally before 230 iters,but it appears nan until 230 iters.It shows error that RuntimeWarning: invalid value encountered in greater overlaps[overlaps > 1.00000001] = 0.0

解决办法： 1、删除小于16x16（或者保险起见8x8）的bbox； 2、在代码中找到T.RandomRotation，注释掉。可以参考一下我的github：https://github.com/clw5180/remote_sensing_object_detection_2019 @Baby47 @oceanleftsea

baltam commented 4 years ago

@clw5180 Go check if you having a consistent class number in your .yml file. :)

Thanks a lot! It's a problem of pytorch/torchvision's version and I try lots of times, torchvision=0.2.1 and pytorch=1.1 finally works.... also I have a question, what does '__C.MODEL.ROI_REC_HEAD.NUMCLASSES = 99' mean ? If I have 18 classes + 1 background, how to set this parameter ? 这个参数是什么含义，需要根据自己数据集的物体类别数量进行改动么，非常感谢 @mjq11302010044

感谢dalao,装对pytorch和torchvision版本真的很重要!

gdjmck commented 4 years ago

具体情况如下： loss: nan (nan) loss_classifier: 0.2950 (1.0475) loss_box_reg: 0.0002 (0.0087) loss_objectness: nan (nan) loss_rpn_box_reg: nan (nan)

求各位大佬指点迷津，感激不尽！ @mjq11302010044

下面是我使用的bbox格式：

boxes.append([x_ctr, y_ctr, width, height, angle, words])
分别为x中心坐标，y中心坐标，宽、高，角度，类别，不知是否是这种坐标形式

im_info格式：

        im_info = {
            'gt_classes': gt_classes,
            'max_classes': max_classes,
            'image': im_path,
            'boxes': gt_boxes,
            'flipped': False,
            'gt_overlaps': overlaps,
            'seg_areas': seg_areas,
            'height': im.shape[0],
            'width': im.shape[1],
            'max_overlaps': max_overlaps,
            'rotated': True
        }

还有DOTA的类别：

    cls_list = \
        {
            'background': 0,
            'roundabout': 1,
            'tennis-court': 2,
            'swimming-pool': 3,
            'storage-tank': 4,
            'soccer-ball-field': 5,
            'small-vehicle': 6,
            'ship': 7,
            'plane': 8,
            'large-vehicle': 9,
            'helicopter': 10,
            'harbor': 11,
            'ground-track-field': 12,
            'bridge': 13,
            'basketball-court': 14,
            'baseball-diamond': 15,
            'helipad': 16,
            'airport': 17,
            'container-crane': 18
        }

DATASET = {
    'IC13':get_ICDAR2013,
    'IC15':get_ICDAR2015_RRC_PICK_TRAIN,
    'IC17mlt':get_ICDAR2017_mlt,
    'LSVT':get_ICDAR_LSVT_full,
    'ArT':get_ICDAR_ArT,
    'ReCTs':get_ICDAR_ReCTs_full,
    'DOTA':get_DOTA,   # clw modify
}

_DEBUG = False
class RotationDataset(torch.utils.data.Dataset):
    CLASSES = (
        "__background__ ",      #"background",
        "roundabout",
        "tennis-court",
        "swimming-pool",
        "storage-tank",
        "soccer-ball-field",
        "small-vehicle",
        "ship",
        "plane",
        "large-vehicle",
        "helicopter",
        "harbor",
        "ground-track-field",
        "bridge",
        "basketball-court",
        "baseball-diamond",
        "helipad",
        "airport",
        "container-crane"
    )

any warnings during your training? I encountered the same issue alongside with the warning that I should change my torch.uint8 type to torch.bool for indexing and I changed it in the add_visibility_to function and the nan loss is gone as well.

54

Gavin-zsr commented 2 years ago

@clw5180 Go check if you having a consistent class number in your .yml file. :)

Thanks a lot! It's a problem of pytorch/torchvision's version and I try lots of times, torchvision=0.2.1 and pytorch=1.1 finally works.... also I have a question, what does '__C.MODEL.ROI_REC_HEAD.NUMCLASSES = 99' mean ? If I have 18 classes + 1 background, how to set this parameter ? 这个参数是什么含义，需要根据自己数据集的物体类别数量进行改动么，非常感谢 @mjq11302010044

I also encountered this problem, and I changed my version of pytorch and torchvision in the same with you, but I still have this problem. Do you have any other suggestions, thanks a lot.

OYLH commented 2 years ago

I meet the same question and I find it beacuse the target is too small to match the min_size 800 seted by default, finally I solve it by change the _C.INPUT.MIN_SIZE_TRAIN in maskrcnn_benchmark/config/defaults.py.

mjq11302010044 / RRPN_pytorch

我在DOTA数据集上测试，不知为什么一训练就会有loss_objectness: nan (nan) loss_rpn_box_reg: nan (nan) #27

54