torchscript inference: success on yolov5s and bad results on yolov5m

Hey, First thank you for your great repo! I'm trying to run inference via c++ libtorch, when I'm running on my laptop(rtx 2060, cuda 11.3) everything work fine, but when running on the deployment PC(gtx 1650) getting some different results for different runs:

yolov5m, cuda 11.3, libtorch-cxx11-abi-shared-with-deps-1.10.0+cu113: inference run but the model cant detect anything yolov5m, cuda 10.2, libtorch-cxx11-abi-shared-with-deps-1.10.1+cu102: inference run, there is detections but post-process takes 160ms vs 4ms of inference (torch::masked_select takes too long) yolov5m, cuda 10.2, libtorch from source inference run, there is detections but post-process takes 110ms vs 10ms of inference yolov5s, cuda 10.2, libtorch from source: everything for fine in realtime(camera produce 30fps) **the python .pt works fine with any cuda version

any ideas what can cause this?

yasenh / libtorch-yolov5

torchscript inference: success on yolov5s and bad results on yolov5m #58