can not get expected speed in onnxruntime

Describe the bug I do some optimization in my pytorch code, and got ~5x acceleration in pytorch inference, but when I export the model to onnx format and infer it in onnxruntime, I can not get the acceleration. the optimization is like: do less calculation in some case. actually, it is a cache mechanism, reuse some of the result.

Urgency if it can not get the acceleration in onnxruntime, my project will fail.

System information onnxruntime installed from pip, 1.5.2 pytorch 1.7

To Reproduce a forward function in my model:

def forward(self, img, mem=None): 221 ┆ count_pred_head = 0 222 ┆ pred_phones = [] 223 ┆ for idx, conv in enumerate(self.backbone): 224 ┆ ┆ if idx in self.reuse_meta["reuse_layer"]: 225 ┆ ┆ ┆ # print(img.shape) 226 ┆ ┆ ┆ img = img[:, :, :, -self.reuse_meta["recompute_size"][idx]:]
227 ┆ ┆ ┆ img = conv(img, True) 228 ┆ ┆ ┆ img = torch.cat([ 229 ┆ ┆ ┆ ┆ mem[:, idx, :, :, 25:25 - self.reuse_meta["update_size"][idx]], 230 ┆ ┆ ┆ ┆ img[:, :, :, -self.reuse_meta["update_size"][idx]:] 231 ┆ ┆ ┆ ], 232 ┆ ┆ ┆ ┆ ┆ ┆ ┆ dim=3) 233 ┆ ┆ ┆ mem[:, idx, :, :, :] = img 234 ┆ ┆ else: 235 ┆ ┆ ┆ img = conv(img, mem is not None) 236 ┆ ┆ ┆ if isinstance(conv, BasicBlock): 237 ┆ ┆ ┆ ┆ img = img[:, :, :, 2:] 238 ┆ return img, mem

Expected behavior got same acceleration in onnxruntime as in pytorch

Additional context why, is there something different in onnx reference mechanism?

microsoft / onnxruntime

can not get expected speed in onnxruntime #5953