Open sevocrear opened 3 years ago
Good afternoon, Could anyone help me? I would really like to know how I could optimize that model. I'd like to use it on Jetson AGX so optimization needed for higher inference time.
Maybe you should convert the NMS function to python code first. I can convert the model to onnx after rewriting that function.
My problem is I get this error when I use Onnxruntime to inference onnx model: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Reshape_914) Op (Reshape) [ShapeInferenceError] Invalid position of 0
Also I'm new to ONNX, so I have no idea about how to fix this.==
@Lllllp93, could you, please, share the code how you converted nms function and how did you convert the model to onnx?
I've also found some website about converting complex models. Might be useful: https://www.fatalerrors.org/a/pytoch-to-tensorrt-transformation-of-complex-models.html
@Lllllp93, could you, please, share the code how you converted nms function and how did you convert the model to onnx?
I've also found some website about converting complex models. Might be useful: https://www.fatalerrors.org/a/pytoch-to-tensorrt-transformation-of-complex-models.html
That function basically refers to @lucastabelini CUDA source code, I checked the output and seems like the accuracy is aligned.
`
def Lane_nms(self,proposals,scores,overlap=50, top_k=4):
'''
re-write lane-nms with python
reference: https://arxiv.org/pdf/2010.12035.pdf
'''
keep_index = []
sorted_score, indices = torch.sort(scores, descending=True) # from big to small
r_filters = np.zeros(len(scores))
for i,indice in enumerate(indices):
if r_filters[i]==1: # continue if this proposal is filted by nms before
continue
keep_index.append(indice)
if len(keep_index)>top_k: # break if more than top_k
break
if i == (len(scores)-1):# break if indice is the last one
break
sub_indices = indices[i+1:]
for sub_i,sub_indice in enumerate(sub_indices):
r_filter = self.Lane_IOU(proposals[indice,:],proposals[sub_indice,:],overlap)
if r_filter: r_filters[i+1+sub_i]=1
num_to_keep = len(keep_index)
def Lane_IOU(self,parent_box, compared_box, threshold):
'''
calculate distance one pair of proposal lines
return True if distance less than threshold
'''
start_a = (parent_box[2] * self.n_strips + 0.5).int() # add 0.5 trick to make int() like round
start_b = (compared_box[2] * self.n_strips + 0.5).int()
start = torch.max(start_a,start_b)
end_a = start_a + parent_box[4] - 1 + 0.5 - (((parent_box[4] - 1)<0).int())
end_b = start_b + compared_box[4] - 1 + 0.5 - (((compared_box[4] - 1)<0).int())
end = torch.min(torch.min(end_a,end_b),torch.tensor(self.n_offsets-1))
if (end - start)<0:
return False
dist = 0
for i in range(5+start,77):
if i>(5+end):
break
if parent_box[i] < compared_box[i]:
dist += compared_box[i] - parent_box[i]
else:
dist += parent_box[i] - compared_box[i]
return dist < (threshold * (end - start + 1))`
Note: this implementation is slower than origin implementation.
@Lllllp93 , thanks for the code! But I noticed that one of your function didn't have return so I added it. Finally, I could convert the model to onnx. And it runs with onnxruntime. I use:
Here is the related part of laneatt.py file code
<\>
def forward(self, x, conf_threshold=0.2, nms_thres=50.0, nms_topk=5):
print(nms_thres, conf_threshold)
batch_features = self.feature_extractor(x)
batch_features = self.conv1(batch_features)
batch_anchor_features = self.cut_anchor_features(batch_features)
# Join proposals from all images into a single proposals features batch
batch_anchor_features = batch_anchor_features.view(-1, self.anchor_feat_channels * self.fmap_h)
# Add attention features
scores = self.attention_layer(batch_anchor_features)
attention = self.softmax(scores).reshape(x.shape[0], len(self.anchors), -1)
# attention_matrix = torch.diag(torch.ones(attention.shape[1], device=x.device)).repeat(x.shape[0], 1, 1)
attention_matrix = torch.eye(attention.shape[1], device=x.device).repeat(x.shape[0], 1, 1)
non_diag_inds = torch.nonzero(attention_matrix == 0., as_tuple=False)
attention_matrix[:] = 0
attention_matrix[non_diag_inds[:, 0], non_diag_inds[:, 1], non_diag_inds[:, 2]] = attention.flatten()
batch_anchor_features = batch_anchor_features.reshape(x.shape[0], len(self.anchors), -1)
attention_features = torch.bmm(torch.transpose(batch_anchor_features, 1, 2),
torch.transpose(attention_matrix, 1, 2)).transpose(1, 2)
attention_features = attention_features.reshape(-1, self.anchor_feat_channels * self.fmap_h)
batch_anchor_features = batch_anchor_features.reshape(-1, self.anchor_feat_channels * self.fmap_h)
batch_anchor_features = torch.cat((attention_features, batch_anchor_features), dim=1)
# Predict
cls_logits = self.cls_layer(batch_anchor_features)
reg = self.reg_layer(batch_anchor_features)
# Undo joining
cls_logits = cls_logits.reshape(x.shape[0], -1, cls_logits.shape[1])
reg = reg.reshape(x.shape[0], -1, reg.shape[1])
# Add offsets to anchors
reg_proposals = torch.zeros((*cls_logits.shape[:2], 5 + self.n_offsets), device=x.device)
reg_proposals += self.anchors
reg_proposals[:, :, :2] = cls_logits
reg_proposals[:, :, 4:] += reg
# Apply nms
proposals_list = self.nms(reg_proposals, attention_matrix, nms_thres, nms_topk, conf_threshold)
return proposals_list
def nms(self, batch_proposals, batch_attention_matrix, nms_thres, nms_topk, conf_threshold):
softmax = nn.Softmax(dim=1)
proposals_list = []
for proposals, attention_matrix in zip(batch_proposals, batch_attention_matrix):
anchor_inds = torch.arange(batch_proposals.shape[1], device=proposals.device)
# The gradients do not have to (and can't) be calculated for the NMS procedure
with torch.no_grad():
scores = softmax(proposals[:, :2])[:, 1]
if conf_threshold is not None:
# apply confidence threshold
above_threshold = scores > conf_threshold
proposals = proposals[above_threshold]
scores = scores[above_threshold]
anchor_inds = anchor_inds[above_threshold]
if proposals.shape[0] == 0:
proposals_list.append((proposals[[]], self.anchors[[]], attention_matrix[[]], None))
continue
# keep, num_to_keep, _ = nms(proposals, scores, overlap=nms_thres, top_k=nms_topk)
keep, num_to_keep, _ = self.Lane_nms(proposals, scores, overlap=nms_thres, top_k=nms_topk)
keep = keep[:num_to_keep]
proposals = proposals[keep]
anchor_inds = anchor_inds[keep]
attention_matrix = attention_matrix[anchor_inds]
proposals_list.append((proposals, self.anchors[keep], attention_matrix, anchor_inds))
return proposals_list
def Lane_nms(self,proposals,scores,overlap=50, top_k=4):
keep_index = []
print(scores)
sorted_score, indices = torch.sort(scores, descending=True) # from big to small
r_filters = np.zeros(len(scores))
for i,indice in enumerate(indices):
if r_filters[i]==1: # continue if this proposal is filted by nms before
continue
keep_index.append(indice)
if len(keep_index)>top_k: # break if more than top_k
break
if i == (len(scores)-1):# break if indice is the last one
break
sub_indices = indices[i+1:]
for sub_i,sub_indice in enumerate(sub_indices):
r_filter = self.Lane_IOU(proposals[indice,:],proposals[sub_indice,:],overlap)
if r_filter: r_filters[i+1+sub_i]=1
num_to_keep = len(keep_index)
keep_index = list(map(lambda x: x.item(), keep_index))
return keep_index, num_to_keep, None
def Lane_IOU(self,parent_box, compared_box, threshold):
'''
calculate distance one pair of proposal lines
return True if distance less than threshold
'''
start_a = (parent_box[2] * self.n_strips + 0.5).int() # add 0.5 trick to make int() like round
start_b = (compared_box[2] * self.n_strips + 0.5).int()
start = torch.max(start_a,start_b)
end_a = start_a + parent_box[4] - 1 + 0.5 - (((parent_box[4] - 1)<0).int())
end_b = start_b + compared_box[4] - 1 + 0.5 - (((compared_box[4] - 1)<0).int())
end = torch.min(torch.min(end_a,end_b),torch.tensor(self.n_offsets-1))
# end = torch.min(torch.min(end_a,end_b),torch.FloatTensor(self.n_offsets-1, device = torch.device('cpu')))
if (end - start)<0:
return False
dist = 0
for i in range(5+start,5 + end.int()):
# if i>(5+end):
# break
if parent_box[i] < compared_box[i]:
dist += compared_box[i] - parent_box[i]
else:
dist += parent_box[i] - compared_box[i]
return dist < (threshold * (end - start + 1))
def loss(self, proposals_list, targets, cls_loss_weight=10):
<\>
Here is the part of code how I export it and run with onnx
torch.onnx.export(model, # model being run
images, # model input (or a tuple for multiple inputs)
"laneatt.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11)
onnx_model = onnx.load("laneatt.onnx")
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession("laneatt.onnx")
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(images)}
ort_outs = ort_session.run(None, ort_inputs)
# compare ONNX Runtime and PyTorch results
onnx_output = [tuple(map(lambda x: torch.tensor(x), ort_outs))]
prediction = model.decode(onnx_output, as_lanes=True)
But the results are different from the original nms implementation
This is the visualization of the model output with original NMS: This is the visualization of the model output with custom NMS given by @Lllllp93 : conf_thrresh and nms_thresh are the same.
So, now I'm investigating the code why results are so different (but actually 3 lines are almost totally the same). Would be great if you point on differences between your nms implementation and @lucastabelini one. Also, it's 20 times slower than original one (just for notion). Tested on my pc with NVIDIA RTX 2080Ti 11Gb, Intel® Core™ i9-9820X CPU @ 3.30GHz × 20, 62,5 GiB RAM
@sevocrear, The return should be:
return torch.tensor(keep_index), num_to_keep
So for the difference between these two implementations, I guess maybe you visualized all lanes after nms. Notice the keep index and num_to_keep in Lane_nms() can be used to limit number of lanes, normally I just keep top 4 lanes in my task.
As I mentioned before, this NMS implementation didn't take advantage of GPU, which means the time complexity is linearly related to the num of lanes. So may be you want to change to CUDA NMS for training. But for inference by CPU or ARM, I guess there is no big difference, it depends on which hardware platform the NN is deployed.
In addition,I just found that the post-processing(decode) was included in the onnx model. And I can run it with onnxruntime after remove post-processing part.
The comparison of two NMS implementation is showed below, FYI.
Origin NMS:
My inplementation:
In addition,I just found that the post-processing(decode) was included in the onnx model. And I can run it with onnxruntime after remove post-processing part.
@Lllllp93 How did you do that? Running decode with onnxruntime
In addition,I just found that the post-processing(decode) was included in the onnx model. And I can run it with onnxruntime after remove post-processing part.
@Lllllp93 How did you do that? Running decode with onnxruntime
@sevocrear I mean the reason that I cannot run with onnnxruntime before is because decode was also included in onnx model. I can run it with onnxruntime after I remove decode from onnx. Sorry for the misunderstanding.
Hey @sevocrear, how exactly have you managed to export the model to ONNX? I've tried removing NMS and exporting the model without it (I'd do it after computing the output), but I keep getting the error:
RuntimeError: input_shape_value == reshape_value || input_shape_value == 1 || reshape_value == 1INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp":520, please report a bug to PyTorch. ONNX Expand input shape constraint not satisfied.
Hey @sevocrear, how exactly have you managed to export the model to ONNX? I've tried removing NMS and exporting the model without it (I'd do it after computing the output), but I keep getting the error:
RuntimeError: input_shape_value == reshape_value || input_shape_value == 1 || reshape_value == 1INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp":520, please report a bug to PyTorch. ONNX Expand input shape constraint not satisfied.
Hi @tersekmatija , have you removed NMS totally? In my implementation, I just changed the CUDA NMS to python NMS function (which, of course, is much slower). And then I could convert it to ONNX. But the inference speed was awful so I use python original code with some changes regarding my project. Me and @Lllllp93 described the changed parts and requirements above
Hey @sevocrear, how exactly have you managed to export the model to ONNX? I've tried removing NMS and exporting the model without it (I'd do it after computing the output), but I keep getting the error:
RuntimeError: input_shape_value == reshape_value || input_shape_value == 1 || reshape_value == 1INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp":520, please report a bug to PyTorch. ONNX Expand input shape constraint not satisfied.
have you sloved this problem?
@Lllllp93 , thanks for the code! But I noticed that one of your function didn't have return so I added it. Finally, I could convert the model to onnx. And it runs with onnxruntime. I use:
- python3.8
- torch1.8.2
- cuda10.2
Here is the related part of laneatt.py file code
<\> def forward(self, x, conf_threshold=0.2, nms_thres=50.0, nms_topk=5): print(nms_thres, conf_threshold) batch_features = self.feature_extractor(x) batch_features = self.conv1(batch_features) batch_anchor_features = self.cut_anchor_features(batch_features) # Join proposals from all images into a single proposals features batch batch_anchor_features = batch_anchor_features.view(-1, self.anchor_feat_channels * self.fmap_h) # Add attention features scores = self.attention_layer(batch_anchor_features) attention = self.softmax(scores).reshape(x.shape[0], len(self.anchors), -1) # attention_matrix = torch.diag(torch.ones(attention.shape[1], device=x.device)).repeat(x.shape[0], 1, 1) attention_matrix = torch.eye(attention.shape[1], device=x.device).repeat(x.shape[0], 1, 1) non_diag_inds = torch.nonzero(attention_matrix == 0., as_tuple=False) attention_matrix[:] = 0 attention_matrix[non_diag_inds[:, 0], non_diag_inds[:, 1], non_diag_inds[:, 2]] = attention.flatten() batch_anchor_features = batch_anchor_features.reshape(x.shape[0], len(self.anchors), -1) attention_features = torch.bmm(torch.transpose(batch_anchor_features, 1, 2), torch.transpose(attention_matrix, 1, 2)).transpose(1, 2) attention_features = attention_features.reshape(-1, self.anchor_feat_channels * self.fmap_h) batch_anchor_features = batch_anchor_features.reshape(-1, self.anchor_feat_channels * self.fmap_h) batch_anchor_features = torch.cat((attention_features, batch_anchor_features), dim=1) # Predict cls_logits = self.cls_layer(batch_anchor_features) reg = self.reg_layer(batch_anchor_features) # Undo joining cls_logits = cls_logits.reshape(x.shape[0], -1, cls_logits.shape[1]) reg = reg.reshape(x.shape[0], -1, reg.shape[1]) # Add offsets to anchors reg_proposals = torch.zeros((*cls_logits.shape[:2], 5 + self.n_offsets), device=x.device) reg_proposals += self.anchors reg_proposals[:, :, :2] = cls_logits reg_proposals[:, :, 4:] += reg # Apply nms proposals_list = self.nms(reg_proposals, attention_matrix, nms_thres, nms_topk, conf_threshold) return proposals_list def nms(self, batch_proposals, batch_attention_matrix, nms_thres, nms_topk, conf_threshold): softmax = nn.Softmax(dim=1) proposals_list = [] for proposals, attention_matrix in zip(batch_proposals, batch_attention_matrix): anchor_inds = torch.arange(batch_proposals.shape[1], device=proposals.device) # The gradients do not have to (and can't) be calculated for the NMS procedure with torch.no_grad(): scores = softmax(proposals[:, :2])[:, 1] if conf_threshold is not None: # apply confidence threshold above_threshold = scores > conf_threshold proposals = proposals[above_threshold] scores = scores[above_threshold] anchor_inds = anchor_inds[above_threshold] if proposals.shape[0] == 0: proposals_list.append((proposals[[]], self.anchors[[]], attention_matrix[[]], None)) continue # keep, num_to_keep, _ = nms(proposals, scores, overlap=nms_thres, top_k=nms_topk) keep, num_to_keep, _ = self.Lane_nms(proposals, scores, overlap=nms_thres, top_k=nms_topk) keep = keep[:num_to_keep] proposals = proposals[keep] anchor_inds = anchor_inds[keep] attention_matrix = attention_matrix[anchor_inds] proposals_list.append((proposals, self.anchors[keep], attention_matrix, anchor_inds)) return proposals_list def Lane_nms(self,proposals,scores,overlap=50, top_k=4): keep_index = [] print(scores) sorted_score, indices = torch.sort(scores, descending=True) # from big to small r_filters = np.zeros(len(scores)) for i,indice in enumerate(indices): if r_filters[i]==1: # continue if this proposal is filted by nms before continue keep_index.append(indice) if len(keep_index)>top_k: # break if more than top_k break if i == (len(scores)-1):# break if indice is the last one break sub_indices = indices[i+1:] for sub_i,sub_indice in enumerate(sub_indices): r_filter = self.Lane_IOU(proposals[indice,:],proposals[sub_indice,:],overlap) if r_filter: r_filters[i+1+sub_i]=1 num_to_keep = len(keep_index) keep_index = list(map(lambda x: x.item(), keep_index)) return keep_index, num_to_keep, None def Lane_IOU(self,parent_box, compared_box, threshold): ''' calculate distance one pair of proposal lines return True if distance less than threshold ''' start_a = (parent_box[2] * self.n_strips + 0.5).int() # add 0.5 trick to make int() like round start_b = (compared_box[2] * self.n_strips + 0.5).int() start = torch.max(start_a,start_b) end_a = start_a + parent_box[4] - 1 + 0.5 - (((parent_box[4] - 1)<0).int()) end_b = start_b + compared_box[4] - 1 + 0.5 - (((compared_box[4] - 1)<0).int()) end = torch.min(torch.min(end_a,end_b),torch.tensor(self.n_offsets-1)) # end = torch.min(torch.min(end_a,end_b),torch.FloatTensor(self.n_offsets-1, device = torch.device('cpu'))) if (end - start)<0: return False dist = 0 for i in range(5+start,5 + end.int()): # if i>(5+end): # break if parent_box[i] < compared_box[i]: dist += compared_box[i] - parent_box[i] else: dist += parent_box[i] - compared_box[i] return dist < (threshold * (end - start + 1)) def loss(self, proposals_list, targets, cls_loss_weight=10): <\>
Here is the part of code how I export it and run with onnx
torch.onnx.export(model, # model being run images, # model input (or a tuple for multiple inputs) "laneatt.onnx", # where to save the model (can be a file or file-like object) export_params=True, # store the trained parameter weights inside the model file opset_version=11) onnx_model = onnx.load("laneatt.onnx") onnx.checker.check_model(onnx_model) ort_session = onnxruntime.InferenceSession("laneatt.onnx") def to_numpy(tensor): return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy() # compute ONNX Runtime output prediction ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(images)} ort_outs = ort_session.run(None, ort_inputs) # compare ONNX Runtime and PyTorch results onnx_output = [tuple(map(lambda x: torch.tensor(x), ort_outs))] prediction = model.decode(onnx_output, as_lanes=True)
But the results are different from the original nms implementation
This is the visualization of the model output with original NMS: This is the visualization of the model output with custom NMS given by @Lllllp93 : conf_thrresh and nms_thresh are the same.
So, now I'm investigating the code why results are so different (but actually 3 lines are almost totally the same). Would be great if you point on differences between your nms implementation and @lucastabelini one. Also, it's 20 times slower than original one (just for notion). Tested on my pc with NVIDIA RTX 2080Ti 11Gb, Intel® Core™ i9-9820X CPU @ 3.30GHz × 20, 62,5 GiB RAM
hey, @Lllllp93 @sevocrear ,i used you code try to convert ,but i got some error,and i can not find the reason,please help!
Traceback (most recent call last):
File "main.py", line 63, in <module>
main()
File "main.py", line 52, in main
runner.train()
File "/home/hw/project/LaneDetection/lane-att/lib/runner.py", line 59, in train
outputs = model(images, **self.cfg.get_train_parameters())
File "/home/hw/anaconda3/envs/laneatt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hw/project/LaneDetection/lane-att/lib/models/laneatt.py", line 126, in forward
proposals_list = self.nms(
File "/home/hw/project/LaneDetection/lane-att/lib/models/laneatt.py", line 150, in nms
keep, num_to_keep, _ = self.Lane_nms(
File "/home/hw/project/LaneDetection/lane-att/lib/models/laneatt.py", line 180, in Lane_nms
r_filter = self.Lane_IOU(proposals[indice, :], proposals[sub_indice, :], overlap)
File "/home/hw/project/LaneDetection/lane-att/lib/models/laneatt.py", line 195, in Lane_IOU
end = torch.min(torch.min(end_a, end_b), torch.tensor(self.n_offsets - 1))
RuntimeError: Expected object of scalar type float but got scalar type long int for argument 'other'
hey @lucastabelini sorry to bother you, i have some difficulties in covert the model to onnx,so i have an idea that remove the nms program in the project,because in my project just need detect one lane . do you think this idea is ok ?
Hi @tersekmatija , have you removed NMS totally? In my implementation, I just changed the CUDA NMS to python NMS function (which, of course, is much slower). And then I could convert it to ONNX. But the inference speed was awful so I use python original code with some changes regarding my project. Me and @Lllllp93 described the changed parts and requirements above
Yes @sevocrear . I just tried removing the NMS in the forwards pass, this is the snapshot:
# Add offsets to anchors
reg_proposals = torch.zeros((*cls_logits.shape[:2], 5 + self.n_offsets), device=x.device)
reg_proposals += self.anchors
reg_proposals[:, :, :2] = cls_logits
reg_proposals[:, :, 4:] += reg
# nms removed here
return reg_proposals
But I am still getting the error. Do you know which version of Python and Torch did you use if it might depend on this?
hey @lucastabelini sorry to bother you, i have some difficulties in covert the model to onnx,so i have an idea that remove the nms program in the project,because in my project just need detect one lane . do you think this idea is ok ?
If you only need to detect one lane boundary (i.e., one line), then you can do that. Just get the one with the highest confidence score.
hey @lucastabelini sorry to bother you, i have some difficulties in covert the model to onnx,so i have an idea that remove the nms program in the project,because in my project just need detect one lane . do you think this idea is ok ?
If you only need to detect one lane boundary (i.e., one line), then you can do that. Just get the one with the highest confidence score.
@lucastabelini ok,i will try in my project,thanks!
Yes @sevocrear . I just tried removing the NMS in the forwards pass, this is the snapshot:
# Add offsets to anchors reg_proposals = torch.zeros((*cls_logits.shape[:2], 5 + self.n_offsets), device=x.device) reg_proposals += self.anchors reg_proposals[:, :, :2] = cls_logits reg_proposals[:, :, 4:] += reg # nms removed here return reg_proposals
But I am still getting the error. Do you know which version of Python and Torch did you use if it might depend on this? @tersekmatija if you try to convert model to onnx,as far as i know,just remove nms is not enough.the 'torch.eye' not support yet.
@hwang12345 It seems that @sevocrear uses it in his forward method and he managed to export it.
@tersekmatija i had try @sevocrear push code ,but i did not succeed.
@hwang12345 It seems like you use torch.min() to compared different data type tensors as the error message shows, self.n_offsets in your code may be a long tensor. You can just convert it to Float or just use torch.FloatTensor(self.n_offsets-1).
@Lllllp93 , thanks for the code! But I noticed that one of your function didn't have return so I added it. Finally, I could convert the model to onnx. And it runs with onnxruntime. I use:
- python3.8
- torch1.8.2
- cuda10.2
Here is the related part of laneatt.py file code
<\> def forward(self, x, conf_threshold=0.2, nms_thres=50.0, nms_topk=5): print(nms_thres, conf_threshold) batch_features = self.feature_extractor(x) batch_features = self.conv1(batch_features) batch_anchor_features = self.cut_anchor_features(batch_features) # Join proposals from all images into a single proposals features batch batch_anchor_features = batch_anchor_features.view(-1, self.anchor_feat_channels * self.fmap_h) # Add attention features scores = self.attention_layer(batch_anchor_features) attention = self.softmax(scores).reshape(x.shape[0], len(self.anchors), -1) # attention_matrix = torch.diag(torch.ones(attention.shape[1], device=x.device)).repeat(x.shape[0], 1, 1) attention_matrix = torch.eye(attention.shape[1], device=x.device).repeat(x.shape[0], 1, 1) non_diag_inds = torch.nonzero(attention_matrix == 0., as_tuple=False) attention_matrix[:] = 0 attention_matrix[non_diag_inds[:, 0], non_diag_inds[:, 1], non_diag_inds[:, 2]] = attention.flatten() batch_anchor_features = batch_anchor_features.reshape(x.shape[0], len(self.anchors), -1) attention_features = torch.bmm(torch.transpose(batch_anchor_features, 1, 2), torch.transpose(attention_matrix, 1, 2)).transpose(1, 2) attention_features = attention_features.reshape(-1, self.anchor_feat_channels * self.fmap_h) batch_anchor_features = batch_anchor_features.reshape(-1, self.anchor_feat_channels * self.fmap_h) batch_anchor_features = torch.cat((attention_features, batch_anchor_features), dim=1) # Predict cls_logits = self.cls_layer(batch_anchor_features) reg = self.reg_layer(batch_anchor_features) # Undo joining cls_logits = cls_logits.reshape(x.shape[0], -1, cls_logits.shape[1]) reg = reg.reshape(x.shape[0], -1, reg.shape[1]) # Add offsets to anchors reg_proposals = torch.zeros((*cls_logits.shape[:2], 5 + self.n_offsets), device=x.device) reg_proposals += self.anchors reg_proposals[:, :, :2] = cls_logits reg_proposals[:, :, 4:] += reg # Apply nms proposals_list = self.nms(reg_proposals, attention_matrix, nms_thres, nms_topk, conf_threshold) return proposals_list def nms(self, batch_proposals, batch_attention_matrix, nms_thres, nms_topk, conf_threshold): softmax = nn.Softmax(dim=1) proposals_list = [] for proposals, attention_matrix in zip(batch_proposals, batch_attention_matrix): anchor_inds = torch.arange(batch_proposals.shape[1], device=proposals.device) # The gradients do not have to (and can't) be calculated for the NMS procedure with torch.no_grad(): scores = softmax(proposals[:, :2])[:, 1] if conf_threshold is not None: # apply confidence threshold above_threshold = scores > conf_threshold proposals = proposals[above_threshold] scores = scores[above_threshold] anchor_inds = anchor_inds[above_threshold] if proposals.shape[0] == 0: proposals_list.append((proposals[[]], self.anchors[[]], attention_matrix[[]], None)) continue # keep, num_to_keep, _ = nms(proposals, scores, overlap=nms_thres, top_k=nms_topk) keep, num_to_keep, _ = self.Lane_nms(proposals, scores, overlap=nms_thres, top_k=nms_topk) keep = keep[:num_to_keep] proposals = proposals[keep] anchor_inds = anchor_inds[keep] attention_matrix = attention_matrix[anchor_inds] proposals_list.append((proposals, self.anchors[keep], attention_matrix, anchor_inds)) return proposals_list def Lane_nms(self,proposals,scores,overlap=50, top_k=4): keep_index = [] print(scores) sorted_score, indices = torch.sort(scores, descending=True) # from big to small r_filters = np.zeros(len(scores)) for i,indice in enumerate(indices): if r_filters[i]==1: # continue if this proposal is filted by nms before continue keep_index.append(indice) if len(keep_index)>top_k: # break if more than top_k break if i == (len(scores)-1):# break if indice is the last one break sub_indices = indices[i+1:] for sub_i,sub_indice in enumerate(sub_indices): r_filter = self.Lane_IOU(proposals[indice,:],proposals[sub_indice,:],overlap) if r_filter: r_filters[i+1+sub_i]=1 num_to_keep = len(keep_index) keep_index = list(map(lambda x: x.item(), keep_index)) return keep_index, num_to_keep, None def Lane_IOU(self,parent_box, compared_box, threshold): ''' calculate distance one pair of proposal lines return True if distance less than threshold ''' start_a = (parent_box[2] * self.n_strips + 0.5).int() # add 0.5 trick to make int() like round start_b = (compared_box[2] * self.n_strips + 0.5).int() start = torch.max(start_a,start_b) end_a = start_a + parent_box[4] - 1 + 0.5 - (((parent_box[4] - 1)<0).int()) end_b = start_b + compared_box[4] - 1 + 0.5 - (((compared_box[4] - 1)<0).int()) end = torch.min(torch.min(end_a,end_b),torch.tensor(self.n_offsets-1)) # end = torch.min(torch.min(end_a,end_b),torch.FloatTensor(self.n_offsets-1, device = torch.device('cpu'))) if (end - start)<0: return False dist = 0 for i in range(5+start,5 + end.int()): # if i>(5+end): # break if parent_box[i] < compared_box[i]: dist += compared_box[i] - parent_box[i] else: dist += parent_box[i] - compared_box[i] return dist < (threshold * (end - start + 1)) def loss(self, proposals_list, targets, cls_loss_weight=10): <\>
Here is the part of code how I export it and run with onnx
torch.onnx.export(model, # model being run images, # model input (or a tuple for multiple inputs) "laneatt.onnx", # where to save the model (can be a file or file-like object) export_params=True, # store the trained parameter weights inside the model file opset_version=11) onnx_model = onnx.load("laneatt.onnx") onnx.checker.check_model(onnx_model) ort_session = onnxruntime.InferenceSession("laneatt.onnx") def to_numpy(tensor): return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy() # compute ONNX Runtime output prediction ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(images)} ort_outs = ort_session.run(None, ort_inputs) # compare ONNX Runtime and PyTorch results onnx_output = [tuple(map(lambda x: torch.tensor(x), ort_outs))] prediction = model.decode(onnx_output, as_lanes=True)
But the results are different from the original nms implementation
This is the visualization of the model output with original NMS: This is the visualization of the model output with custom NMS given by @Lllllp93 : conf_thrresh and nms_thresh are the same.
So, now I'm investigating the code why results are so different (but actually 3 lines are almost totally the same). Would be great if you point on differences between your nms implementation and @lucastabelini one. Also, it's 20 times slower than original one (just for notion). Tested on my pc with NVIDIA RTX 2080Ti 11Gb, Intel® Core™ i9-9820X CPU @ 3.30GHz × 20, 62,5 GiB RAM
Hello. Can you provide the complete code?
Hello, I have some difficulties converting your model both to onnx and to torchscript. I've read closed issues already but there isn't any help. Could you help me? Or, may be, someone succeeded to do that. Below I'll show code and errors I get after I try to convert the model.
2nd_model.pt is the full model saved in pytorch. Works fine after loading in python.
Converting the model to onnx
Error:
Error 2 (different):
Scripting the model
Error: