RuntimeError: Trying to create tensor with negative dimension

gaussiangit commented 4 years ago

Tried with torchvision 0.5, 0.6 Also torch 1.4, 1.5 Can you tell me the problem ? It occurs in coco eval phase. Also only happens with D7

zylo117 commented 4 years ago

pls provide more info

gaussiangit commented 4 years ago

Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/miniconda3/envs/pytorch-nets/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, kwargs) File "/home/miniconda3/envs/pytorch-nets/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "train.py", line 79, in forward imgs=imgs, obj_list=obj_list) File "/home/miniconda3/envs/pytorch-nets/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/Yet-Another-EfficientDet-Pytorch/efficientdet/loss.py", line 153, in forward 0.5, 0.3) File "/home/Yet-Another-EfficientDet-Pytorch/utils/utils.py", line 107, in postprocess anchors_nms_idx = nms(transformed_anchors_per, scores_per[:, 0], iou_threshold=iou_threshold) File "/home/miniconda3/envs/pytorch-nets/lib/python3.6/site-packages/torchvision/ops/boxes.py", line 33, in nms return _C.nms(boxes, scores, iou_threshold) RuntimeError: Trying to create tensor with negative dimension -1242957280: [-1242957280] (check_size_nonnegative at /opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/ATen/native/TensorFactories.h:64) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f865b0a8e37 in /home/miniconda3/envs/pytorch-nets/lib/python3.6/site-packages/torch/lib/libc10.so)

@zylo117 This happens when I set debug true while training. Otherwise it trains.

zylo117 commented 4 years ago

Yet-Another-EfficientDet-Pytorch/utils/utils.py", line 107

can you debug on this line

gaussiangit commented 4 years ago

@zylo117 It happens only when the debug is True on larger models like d5, d6, d7. d0 is working fine with debug. The issue is also mentioned here. https://github.com/pytorch/vision/issues/1705

I am using conda env with torch 1.4 and torchvision 0.5 Now it is out of memory error on the same line. I tried the minimum batch size. Also I am training on 4 GPUs. Any recommendations ?

zylo117 commented 4 years ago

Maybe there is a bug in nms function. Try:

you can implement pytorch nms or numpy nms by manipulating tensor/array to do nms.
set a higher threshold here, https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/blob/master/efficientdet/loss.py#L174

tmyoda commented 3 years ago

I added batched_nms function at utils/utils.py and it seems to be fine. Also, delete the import of batch_nms.

# https://github.com/ponta256/fssd-resnext-voc-coco/blob/master/layers/box_utils.py#L245
def nms(boxes, scores, nms_thresh=0.5, top_k=200):
    boxes = boxes.cpu().numpy()
    scores = scores.cpu().numpy()
    keep = []
    if len(boxes) == 0:
        return keep
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = (x2-x1)*(y2-y1)
    idx = np.argsort(scores, axis=0)   # sort in ascending order
    idx = idx[-top_k:]  # indices of the top-k largest vals

    while len(idx) > 0:
        last = len(idx)-1
        i = idx[last]  # index of current largest val
        keep.append(i)

        xx1 = np.maximum(x1[i], x1[idx[:last]])
        yy1 = np.maximum(y1[i], y1[idx[:last]])
        xx2 = np.minimum(x2[i], x2[idx[:last]])
        yy2 = np.minimum(y2[i], y2[idx[:last]])

        w = np.maximum(0, xx2-xx1)
        h = np.maximum(0, yy2-yy1)

        inter = w*h
        iou = inter / (area[idx[:last]]+area[i]-inter)
        idx = np.delete(idx, np.concatenate(([last], np.where(iou > nms_thresh)[0])))

    return np.array(keep, dtype=np.int64)

# https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py#L39
def batched_nms(
    boxes,
    scores,
    idxs,
    iou_threshold,
):

    if boxes.numel() == 0:
        return torch.empty((0,), dtype=torch.int64, device=boxes.device)
    else:
        max_coordinate = boxes.max()
        offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
        boxes_for_nms = boxes + offsets[:, None]
        keep = nms(boxes_for_nms, scores, nms_thresh=iou_threshold)
        return keep

zylo117 / Yet-Another-EfficientDet-Pytorch

RuntimeError: Trying to create tensor with negative dimension #225