NMS_GPU different results from NMS_CPU

sidazhang commented 6 years ago

I compiled successfully in Ubuntu 14.04 and I use Python 3.6

import numpy as np
detections = torch.from_numpy(np.array([
    (12, 84, 140, 212, 0.1),
    (24, 84, 152, 212, 0.8),
    (36, 84, 164, 212, 0.7),
    (12, 96, 140, 224, 0.6),
    (24, 96, 152, 224, 0.5),
    (24, 108, 152, 236, 0.9)
]))

# print(detections)
print(nms_gpu(detections, 0.1))
print(nms_cpu(detections, 0.1))

## print results below
tensor([], dtype=torch.int32)
tensor([ 5], dtype=torch.int32)

You can see here, the CPU results is correct, whereas the GPU results is not returning anything

jwyang commented 6 years ago

Hi, @sidazhang convert detections from cpu data to gpu data and try again

sidazhang commented 6 years ago

Great catch! Thank you

But still different results

from rnet.models.nms.nms_gpu import nms_gpu
from rnet.models.nms.nms_cpu import nms_cpu
import numpy as np
detections = torch.from_numpy(np.array([
    (12, 84, 140, 212, 0.1),
    (24, 84, 152, 212, 0.8),
    (36, 84, 164, 212, 0.7),
    (12, 96, 140, 224, 0.6),
    (24, 96, 152, 224, 0.5),
    (24, 108, 152, 236, 0.9)
]))

# print(detections)
print(nms_gpu(detections.cuda(), 0.1))
print(nms_cpu(detections, 0.1))

### print results below
tensor([[ 0],
        [ 1],
        [ 5]], dtype=torch.int32, device='cuda:0')
tensor([ 5], dtype=torch.int32)

jwyang commented 6 years ago

I see, i think there are some differences between cpu and gpu version. cpu version is contributed by others, I will check this a bit

sidazhang commented 6 years ago

Red is the one with the 0.9 score. It seems to me that the CPU version is correct?

jwyang commented 6 years ago

I see, it is weird, appreciate for pointing out this, I will check this on my side.

zhenheny commented 6 years ago

Hi @sidazhang @jwyang I think it is related to #212 . Line 23 and Line 24 need to be fixed. https://github.com/jwyang/faster-rcnn.pytorch/blob/7079dfbc4e734168e1a123cb1a5a60cdc39f52ed/lib/model/nms/nms_cpu.py#L23

xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])

should generate same results as gpu version code.

sidazhang commented 6 years ago

@jwyang @zhenheny I modified the CPU version to have minimum as you suggested. But still the same results.

This is the nms_cpu that I have:

from __future__ import absolute_import

import numpy as np
import torch

def nms_cpu(dets, thresh):
    dets = dets.numpy()
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order.item(0)
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    return torch.IntTensor(keep)

Results that I am getting

detections = torch.from_numpy(np.array([
    (12, 84, 140, 212, 0.1),
    (24, 84, 152, 212, 0.8),
    (36, 84, 164, 212, 0.7),
    (12, 96, 140, 224, 0.6),
    (24, 96, 152, 224, 0.5),
    (24, 108, 152, 236, 0.9)
]))

# print(detections)
print(nms_gpu(detections.cuda(), 0.1))
print('--')
print(nms_cpu(detections, 0.1))

tensor([[ 0],
        [ 1],
        [ 5]], dtype=torch.int32, device='cuda:0')
--
tensor([ 5], dtype=torch.int32)

zhenheny commented 6 years ago

@sidazhang Ah, I see, something wrong with the gpu version. Don't know what happened. To be on the safe side, I will go with cpu version for now.

editted I looked through the cuda code, it seems to be the same as in faster r_cnn. I don't know what happened.

Karthik-Suresh93 commented 6 years ago

Hi @jwyang , @sidazhang , @zhenheny did you solve this issue?

sidazhang commented 6 years ago

@Karthik-Suresh93 Are you also able to reproduce?

I have not been able to resolve this issue and so I have been using the cpu version of nms only

Karthik-Suresh93 commented 6 years ago

I am getting highly overlapping boxes in my output image. test_set_3

What could be the reason?

sidazhang commented 6 years ago

@Karthik-Suresh93 I think there is a bug in the GPU nms

@jwyang have you had a chance to verify?

ljtruong commented 6 years ago

Not a solution, but something that might help. I think the detections need to be sorted by confidence then computed to follow the original implementation. Which did initially use a sorted list. This was added to gpu_nms.py.

    scores = dets[:, 4]
    order = scores.sort(descending=True)[::-1]
    sorted_dets = dets[order[0], :]

argsort was replaced as we're working with pytorch tensors.

this was the output.

tensor([[ 0],
        [ 1],
        [ 3],
        [ 5]], dtype=torch.int32, device='cuda:0')

The cuda implementation is the same. So I'm stumped on where the error may be.

ljtruong commented 6 years ago

@sidazhang I did a few tests. I'm using pytorch 0.4, python 3.6.

When running the demo.py script and including pdb.set_trace() just before this line.

https://github.com/jwyang/faster-rcnn.pytorch/blob/baa0385c0ec8cbd56fa204b1eea6aa15fe9ff0ea/demo.py#L348-L349

Then I executed nms(cls_dets, cfg.TEST.NMS, force_cpu=True) on each loop changing force_cpu to True or False each time, then printing out the nms results. I used nms threshold ranging from 0.3 - 1. The CPU and GPU results matched.

However, when running the functions separately like your example. It seems to show a discrepancy.

hbwx24 commented 6 years ago

@sidazhang @jwyang @zhenheny what happed to minimum and maximum! I also have this doubts.Is there anything special here?why is there no problem with the final result ?

jwyang commented 5 years ago

@Worulz is right, to use the gpu version nms, the detection results should be sorted according to the scores before being sent to nms(*).

jwyang commented 5 years ago

let me know if you guys have already done sorting but still got the same issue.

AlexanderHustinx commented 4 years ago

Has this issue been resolved? Or do you still recommend to use either CPU NMS or pre-sorted GPU NMS?

Additionally, before I dive into the code, could some one tell me if this is an implementation of Greedy NMS, or Soft-NMS?

devendraswamy commented 4 years ago

when i am kept the training, in few minutes the server was restarted so please tell any suggestion for this . thanking you !

jwyang / faster-rcnn.pytorch

NMS_GPU different results from NMS_CPU #220