Open sidazhang opened 6 years ago
Hi, @sidazhang convert detections from cpu data to gpu data and try again
Great catch! Thank you
But still different results
from rnet.models.nms.nms_gpu import nms_gpu
from rnet.models.nms.nms_cpu import nms_cpu
import numpy as np
detections = torch.from_numpy(np.array([
(12, 84, 140, 212, 0.1),
(24, 84, 152, 212, 0.8),
(36, 84, 164, 212, 0.7),
(12, 96, 140, 224, 0.6),
(24, 96, 152, 224, 0.5),
(24, 108, 152, 236, 0.9)
]))
# print(detections)
print(nms_gpu(detections.cuda(), 0.1))
print(nms_cpu(detections, 0.1))
### print results below
tensor([[ 0],
[ 1],
[ 5]], dtype=torch.int32, device='cuda:0')
tensor([ 5], dtype=torch.int32)
I see, i think there are some differences between cpu and gpu version. cpu version is contributed by others, I will check this a bit
Red is the one with the 0.9 score. It seems to me that the CPU version is correct?
I see, it is weird, appreciate for pointing out this, I will check this on my side.
Hi @sidazhang @jwyang I think it is related to #212 . Line 23 and Line 24 need to be fixed. https://github.com/jwyang/faster-rcnn.pytorch/blob/7079dfbc4e734168e1a123cb1a5a60cdc39f52ed/lib/model/nms/nms_cpu.py#L23
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
should generate same results as gpu version code.
@jwyang @zhenheny I modified the CPU version to have minimum as you suggested. But still the same results.
This is the nms_cpu that I have:
from __future__ import absolute_import
import numpy as np
import torch
def nms_cpu(dets, thresh):
dets = dets.numpy()
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order.item(0)
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return torch.IntTensor(keep)
Results that I am getting
detections = torch.from_numpy(np.array([
(12, 84, 140, 212, 0.1),
(24, 84, 152, 212, 0.8),
(36, 84, 164, 212, 0.7),
(12, 96, 140, 224, 0.6),
(24, 96, 152, 224, 0.5),
(24, 108, 152, 236, 0.9)
]))
# print(detections)
print(nms_gpu(detections.cuda(), 0.1))
print('--')
print(nms_cpu(detections, 0.1))
tensor([[ 0],
[ 1],
[ 5]], dtype=torch.int32, device='cuda:0')
--
tensor([ 5], dtype=torch.int32)
@sidazhang Ah, I see, something wrong with the gpu version. Don't know what happened. To be on the safe side, I will go with cpu version for now.
editted I looked through the cuda code, it seems to be the same as in faster r_cnn. I don't know what happened.
Hi @jwyang , @sidazhang , @zhenheny did you solve this issue?
@Karthik-Suresh93 Are you also able to reproduce?
I have not been able to resolve this issue and so I have been using the cpu version of nms only
I am getting highly overlapping boxes in my output image.
What could be the reason?
@Karthik-Suresh93 I think there is a bug in the GPU nms
@jwyang have you had a chance to verify?
Not a solution, but something that might help. I think the detections need to be sorted by confidence then computed to follow the original implementation. Which did initially use a sorted list. This was added to gpu_nms.py.
scores = dets[:, 4]
order = scores.sort(descending=True)[::-1]
sorted_dets = dets[order[0], :]
argsort was replaced as we're working with pytorch tensors.
this was the output.
tensor([[ 0],
[ 1],
[ 3],
[ 5]], dtype=torch.int32, device='cuda:0')
The cuda implementation is the same. So I'm stumped on where the error may be.
@sidazhang I did a few tests. I'm using pytorch 0.4, python 3.6.
Then I executed nms(cls_dets, cfg.TEST.NMS, force_cpu=True)
on each loop changing force_cpu to True or False each time, then printing out the nms results. I used nms threshold ranging from 0.3 - 1. The CPU and GPU results matched.
However, when running the functions separately like your example. It seems to show a discrepancy.
@sidazhang @jwyang @zhenheny what happed to minimum and maximum! I also have this doubts.Is there anything special here?why is there no problem with the final result ?
@Worulz is right, to use the gpu version nms, the detection results should be sorted according to the scores before being sent to nms(*).
let me know if you guys have already done sorting but still got the same issue.
Has this issue been resolved? Or do you still recommend to use either CPU NMS or pre-sorted GPU NMS?
Additionally, before I dive into the code, could some one tell me if this is an implementation of Greedy NMS, or Soft-NMS?
when i am kept the training, in few minutes the server was restarted so please tell any suggestion for this . thanking you !
I compiled successfully in Ubuntu 14.04 and I use Python 3.6
You can see here, the CPU results is correct, whereas the GPU results is not returning anything