haofeixu / aanet

[CVPR'20] AANet: Adaptive Aggregation Network for Efficient Stereo Matching
Apache License 2.0
524 stars 102 forks source link

how to convert this to onnx #17

Closed sdu2011 closed 4 years ago

sdu2011 commented 4 years ago

为了方便表述,我就用中文了. 我想把模型转成onnx格式的.我在inference.py中添加代码:

        with torch.no_grad():
            time_start = time.perf_counter()

            if args.export_onnx == "true":
                # 生成onnx格式的model
                print('*************export onnx model*************')
                print('left{},right{}'.format(left.shape,right.shape))
                torch.onnx.export(aanet,(left,right),"aanet.onnx",input_names=['left','right'],verbose=True)
                break
            else:
                print('left{},right{}'.format(left.shape,right.shape))
                pred_disp = aanet(left, right)[-1]  # [B, H, W]
                break 

但是输出如下:

*************export onnx model*************
lefttorch.Size([1, 3, 384, 1248]),righttorch.Size([1, 3, 384, 1248])
*************aanet forward begin************
/source/aanet/nets/deform_conv/deform_conv.py:147: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ctx.groups, ctx.deformable_groups, ctx.with_bias)
/source/aanet/nets/deform_conv/deform_conv.py:147: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator zero_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  ctx.groups, ctx.deformable_groups, ctx.with_bias)
/source/aanet/nets/deform_conv/deform_conv.py:147: TracerWarning: There are 7 live references to the data region being modified when tracing in-place operator addmm_. This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  ctx.groups, ctx.deformable_groups, ctx.with_bias)
/source/aanet/nets/deform_conv/deform_conv.py:147: TracerWarning: There are 10 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  ctx.groups, ctx.deformable_groups, ctx.with_bias)
/source/aanet/nets/cost.py:48: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  cost_volume[:, i, :, :] = (left_feature * right_feature).mean(dim=1)
/source/aanet/nets/cost.py:46: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
  right_feature[:, :, :, :-i]).mean(dim=1)
/source/aanet/nets/aggregation.py:394: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if exchange.size()[2:] != x_fused[i].size()[2:]:
/source/aanet/nets/estimation.py:21: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if cost_volume.size(1) == self.max_disp:
/source/aanet/nets/refinement.py:84: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if scale_factor == 1.0:
/source/aanet/nets/warp.py:51: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert disp.min() >= 0
/source/aanet/nets/warp.py:56: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  print('grid:{},offset:{}'.format(grid.shape,offset.shape))
grid:torch.Size([1, 2, 192, 624]),offset:torch.Size([1, 2, 128, 416])

报错的deform_conv.py:147如下:

        deform_conv_cuda.modulated_deform_conv_cuda_forward(
            input, weight, bias, ctx._bufs[0], offset, mask, output,
            ctx._bufs[1], weight.shape[2], weight.shape[3], ctx.stride,
            ctx.stride, ctx.padding, ctx.padding, ctx.dilation, ctx.dilation,
            ctx.groups, ctx.deformable_groups, ctx.with_bias)

tracer在记录模型的前向推理的过程里,由于调用了deform_conv_cuda.so的modulated_deform_conv_cuda_forward,导致没有正确记录所有的算子,后面导致grid和offset的维度不一致. 请问这个要怎么修改呢?

haofeixu commented 4 years ago

Maybe deformable conv is not supported by onnx.