Closed rtrahms closed 3 years ago
Hi @rtrahms, I don't have experience to load models into OpenCV, but I guess maybe some of the layers are not supported in current OpenCV, and I would recommend to use LibTorch instead.
Hi @yasenh, thanks for the advice - yes, I am using libTorch now. Have built in Visual Studio with help from the following article: https://expoundai.wordpress.com/2020/10/13/setting-up-a-cpp-project-in-visual-studio-2019-with-libtorch-1-6/
Builds successfully, but throws exception on torch::jit::load() call. I've tried both the original pt file from training and the modified export version you suggested. Not happy with the file for some reason, but both versions are readable by netron.
Rob
@rtrahms when you use the gpu model, did you add --gpu flag when running the application? or remove the gpu flag when use CPU version?
@yasenh I don't use your code directly, but using it as a code reference. In my code I have set torch::Device device = torch::kCPU and torch::kCUDA. the issue is on the initial load, which may indicate my torchscript converted file is not correct. I do need that torchscript version for the load, yes? The original pt file will not work I assume, for either CPU or GPU versions.
I answered my own question - the torchscript version is needed. But having some issues converting using that export.py script. I created my own thinking it would simplify, but crashes on the torch.load statement:
import torch import torchvision import os
pt_file = "yolov5s.pt" torchscript_file = "yolov5s.torchscript-cpu.pt"
is_file = os.path.isfile(pt_file)
batch_size = 1 img_size = 416
print("creating dummy img")
img = torch.rand((batch_size, 3, img_size, img_size))
print("loading network")
model = torch.load(pt_file)
print("tracing network")
traced_script_module = torch.jit.trace(model,img)
print("saving script module")
traced_script_module.save(torchscript_file)
@rtrahms Could you share the issues when using "export.py"? And I notice that you set img_size = 416
here, so did you export the model with image size as 416?
python models/export.py --weights yolov5s.pt --img 416 --batch 1
And I would highly recommend that you use the export.py from the official yolov5 python version.
@yasenh - Yes, I have used 416x416 as the original Darknet Yolo image input size.
I wanted to understand the issues with export.py, so I created what I thought would be a standalone utility to do the same thing. The code is below. What I discovered was that the pt file used as input is making references to files in the original yolov5 filestructure, namely models and utils folders. After copying those over, the code below worked for me. Besides netron, I don't know of an easy way to edit/modify either the original PT file or the generated torchscript PT file.
import torch import torch.nn as nn
import torchvision import os
pt_file = 'yolov5s_ob12.pt' torchscript_file = "yolov5s_ob12.torchscript-cpu.pt"
is_file = os.path.isfile(pt_file)
batch_size = 1 img_size = 416
print("creating dummy img")
img = torch.rand((batch_size, 3, img_size, img_size))
print("loading network")
model = torch.load(pt_file, map_location=torch.device('cpu'))['model'].float()
model.eval()
model.model[-1].export = True # set Detect() layer export=True
y = model(img) # dry run
print("tracing network")
traced_script_module = torch.jit.trace(model,img)
print("saving script module")
traced_script_module.save(torchscript_file)
print("Complete! Exiting.")
@rtrahms So there are some difference between your version and the expory.py:
https://github.com/ultralytics/yolov5/blob/c8c5ef36c9a19c7843993ee8d51aebb685467eca/models/experimental.py#L137-L144
https://github.com/ultralytics/yolov5/blob/master/models/export.py#L43-L47
BTW, did you set "model.model[-1].export = False" when export? And I am still not sure why you don't use the "export.py" directly?
Mainly I am using a separate script to understand what is going on. I have tried setting export to both True and False, no change. I actually do not know what that flag does anyway.
Making this even more simple, I attempted creating a torchscript file from a natively constructed network (code below). Generated a torchscript file, but also throws exception when attempting to load in cpp application.
import torch import torch.nn as nn
import torchvision import os
torchscript_file = "native_test.torchscript-cpu.pt"
class MyModule(torch.nn.Module): def init(self, N, M): super(MyModule, self).init() self.weight = torch.nn.Parameter(torch.rand(N, M))
def forward(self, input):
if input.sum() > 0:
output = self.weight.mv(input)
else:
output = self.weight + input
return output
my_module = MyModule(10,20) sm = torch.jit.script(my_module)
print(sm.code)
print("saving torchscript file")
sm.save(torchscript_file) print("Completed. Exiting.")
@rtrahms Maybe this tutorial can help: LOADING A TORCHSCRIPT MODEL IN C++, make sure you use same version of the PyTorch and LibTorch
@yasenh - update. Using the cpu variant of the exported torchscript file and code for your Detector class (and some adjustment of the hard-coded internal 640x640 image input) I was able to load the torchscript network in a cpp application and successfully run inference. CUDA version of torchscript network still doesn't load, looking into why (I did change the tensor types like you suggest)...
@yasenh - another update. Found the issue with CUDA version. The MS Visual Studio project was manually generated, and did not recognize any CUDA devices. This was due to a missing linker flag: /INCLUDE:"?warp_size@cuda@at@@YAHXZ"
https://github.com/pytorch/pytorch/issues/35604
After building with this flag, the CUDA device is detected, and the CUDA DNN loads successfully. The forward() call now crashes with an exception, but progress.
@yasenh - So I am wondering if the adjustments you suggested to export.py on the ultralytics yolov5 repo are still valid. The code has changed, so there might be some updates needed to your instructions. Can you take a look? Thanks!
@yasenh - update. So it looks like my CUDA torchscript DNN loads and can run inference... once. A second time through causes an exception. I noticed my warm-up forward() call was passing, but passing the real image through afterwards did not work. Also, skipping the warm-up and passing the image in the first pass also works. Is there CUDA cleanup that needs to happen after a torch::forward() call?
@rtrahms thanks for the update, I only test it on my 1070 GPU, maybe you are using a GPU with less memory? I will try to figure it out but may take some time due to my current schedule.
@yasenh - Here's a clue... if I insert a call module_.eval(); after processing the detections in the Detector::Run() method, it does work on repeated calls to Run(). Another clue, this eval() call is not needed for the CPU variant, with that Detector::Run() can be called repeatedly with no issue.
module_.eval() is called in the constructor @rtrahms https://github.com/yasenh/libtorch-yolov5/blob/23c3e5c57addd533d96fcf1c39f0d8fdbf078803/src/detector.cpp#L21
@yasenhu - yes, that gave me the idea to call it after the inference call. The Detector object inference call works the first time but not after the first time.
Thanks, I found the issue. I rebuilt with libtorch 1.7, CUDA 11.0, and included ALL DLLs from libtorch distribution and CUDA distribution. Worked as expected.
Good to hear that!
OpenCV 4.2.0/4.4.0/4.5.0 DNN API (readNetFromTorch(model_file)) I have not been able to load a yolov5/torch format pt file (or the torchscript CUDA variant) in OpenCV without an exception being thrown. Have you tried this? Thanks, Rob