yasenh / libtorch-yolov5

A LibTorch inference implementation of the yolov5
MIT License
374 stars 114 forks source link

Question: loading yolov5/torch pt file in OpenCV DNN API on Windows? #22

Closed rtrahms closed 3 years ago

rtrahms commented 3 years ago

OpenCV 4.2.0/4.4.0/4.5.0 DNN API (readNetFromTorch(model_file)) I have not been able to load a yolov5/torch format pt file (or the torchscript CUDA variant) in OpenCV without an exception being thrown. Have you tried this? Thanks, Rob

yasenh commented 3 years ago

Hi @rtrahms, I don't have experience to load models into OpenCV, but I guess maybe some of the layers are not supported in current OpenCV, and I would recommend to use LibTorch instead.

rtrahms commented 3 years ago

Hi @yasenh, thanks for the advice - yes, I am using libTorch now. Have built in Visual Studio with help from the following article: https://expoundai.wordpress.com/2020/10/13/setting-up-a-cpp-project-in-visual-studio-2019-with-libtorch-1-6/

Builds successfully, but throws exception on torch::jit::load() call. I've tried both the original pt file from training and the modified export version you suggested. Not happy with the file for some reason, but both versions are readable by netron.

Rob

yasenh commented 3 years ago

@rtrahms when you use the gpu model, did you add --gpu flag when running the application? or remove the gpu flag when use CPU version?

rtrahms commented 3 years ago

@yasenh I don't use your code directly, but using it as a code reference. In my code I have set torch::Device device = torch::kCPU and torch::kCUDA. the issue is on the initial load, which may indicate my torchscript converted file is not correct. I do need that torchscript version for the load, yes? The original pt file will not work I assume, for either CPU or GPU versions.

I answered my own question - the torchscript version is needed. But having some issues converting using that export.py script. I created my own thinking it would simplify, but crashes on the torch.load statement:

import torch import torchvision import os

pt_file = "yolov5s.pt" torchscript_file = "yolov5s.torchscript-cpu.pt"

torchscript_file = "yolov5s_ob12.torchscript-cuda.pt"

is_file = os.path.isfile(pt_file)

batch_size = 1 img_size = 416

print("creating dummy img")

img = torch.rand((batch_size, 3, img_size, img_size))

print("loading network")

model = torch.load(pt_file)

print("tracing network")

traced_script_module = torch.jit.trace(model,img)

print("saving script module")

traced_script_module.save(torchscript_file)

yasenh commented 3 years ago

@rtrahms Could you share the issues when using "export.py"? And I notice that you set img_size = 416 here, so did you export the model with image size as 416?

python models/export.py --weights yolov5s.pt --img 416 --batch 1

And I would highly recommend that you use the export.py from the official yolov5 python version.

rtrahms commented 3 years ago

@yasenh - Yes, I have used 416x416 as the original Darknet Yolo image input size.

I wanted to understand the issues with export.py, so I created what I thought would be a standalone utility to do the same thing. The code is below. What I discovered was that the pt file used as input is making references to files in the original yolov5 filestructure, namely models and utils folders. After copying those over, the code below worked for me. Besides netron, I don't know of an easy way to edit/modify either the original PT file or the generated torchscript PT file.

import torch import torch.nn as nn

import torchvision import os

pt_file = 'yolov5s_ob12.pt' torchscript_file = "yolov5s_ob12.torchscript-cpu.pt"

torchscript_file = "yolov5s_ob12.torchscript-cuda.pt"

is_file = os.path.isfile(pt_file)

create dummy input image

batch_size = 1 img_size = 416

print("creating dummy img")

img = torch.rand((batch_size, 3, img_size, img_size))

print("loading network")

model = torch.load(pt_file, map_location=torch.device('cpu'))['model'].float()

model.eval()

model.model[-1].export = True # set Detect() layer export=True

model.model[-1].export = False # modified 10/21 rgt

y = model(img) # dry run

print("tracing network")

trace (run model with dummy input) - store trace in torchscript module

traced_script_module = torch.jit.trace(model,img)

print("saving script module")

serialize torchscript module

traced_script_module.save(torchscript_file)

print("Complete! Exiting.")

yasenh commented 3 years ago

@rtrahms So there are some difference between your version and the expory.py:
https://github.com/ultralytics/yolov5/blob/c8c5ef36c9a19c7843993ee8d51aebb685467eca/models/experimental.py#L137-L144 https://github.com/ultralytics/yolov5/blob/master/models/export.py#L43-L47

BTW, did you set "model.model[-1].export = False" when export? And I am still not sure why you don't use the "export.py" directly?

rtrahms commented 3 years ago

Mainly I am using a separate script to understand what is going on. I have tried setting export to both True and False, no change. I actually do not know what that flag does anyway.

Making this even more simple, I attempted creating a torchscript file from a natively constructed network (code below). Generated a torchscript file, but also throws exception when attempting to load in cpp application.

import torch import torch.nn as nn

import torchvision import os

torchscript_file = "native_test.torchscript-cpu.pt"

torchscript_file = "native_test.torchscript-cuda.pt"

class MyModule(torch.nn.Module): def init(self, N, M): super(MyModule, self).init() self.weight = torch.nn.Parameter(torch.rand(N, M))

def forward(self, input):
    if input.sum() > 0:
      output = self.weight.mv(input)
    else:
      output = self.weight + input
    return output

my_module = MyModule(10,20) sm = torch.jit.script(my_module)

print(sm.code)

print("saving torchscript file")

sm.save(torchscript_file) print("Completed. Exiting.")

yasenh commented 3 years ago

@rtrahms Maybe this tutorial can help: LOADING A TORCHSCRIPT MODEL IN C++, make sure you use same version of the PyTorch and LibTorch

rtrahms commented 3 years ago

@yasenh - update. Using the cpu variant of the exported torchscript file and code for your Detector class (and some adjustment of the hard-coded internal 640x640 image input) I was able to load the torchscript network in a cpp application and successfully run inference. CUDA version of torchscript network still doesn't load, looking into why (I did change the tensor types like you suggest)...

rtrahms commented 3 years ago

@yasenh - another update. Found the issue with CUDA version. The MS Visual Studio project was manually generated, and did not recognize any CUDA devices. This was due to a missing linker flag: /INCLUDE:"?warp_size@cuda@at@@YAHXZ"

https://github.com/pytorch/pytorch/issues/35604

After building with this flag, the CUDA device is detected, and the CUDA DNN loads successfully. The forward() call now crashes with an exception, but progress.

rtrahms commented 3 years ago

@yasenh - So I am wondering if the adjustments you suggested to export.py on the ultralytics yolov5 repo are still valid. The code has changed, so there might be some updates needed to your instructions. Can you take a look? Thanks!

rtrahms commented 3 years ago

@yasenh - update. So it looks like my CUDA torchscript DNN loads and can run inference... once. A second time through causes an exception. I noticed my warm-up forward() call was passing, but passing the real image through afterwards did not work. Also, skipping the warm-up and passing the image in the first pass also works. Is there CUDA cleanup that needs to happen after a torch::forward() call?

yasenh commented 3 years ago

@rtrahms thanks for the update, I only test it on my 1070 GPU, maybe you are using a GPU with less memory? I will try to figure it out but may take some time due to my current schedule.

rtrahms commented 3 years ago

@yasenh - Here's a clue... if I insert a call module_.eval(); after processing the detections in the Detector::Run() method, it does work on repeated calls to Run(). Another clue, this eval() call is not needed for the CPU variant, with that Detector::Run() can be called repeatedly with no issue.

yasenh commented 3 years ago

module_.eval() is called in the constructor @rtrahms https://github.com/yasenh/libtorch-yolov5/blob/23c3e5c57addd533d96fcf1c39f0d8fdbf078803/src/detector.cpp#L21

rtrahms commented 3 years ago

@yasenhu - yes, that gave me the idea to call it after the inference call. The Detector object inference call works the first time but not after the first time.

rtrahms commented 3 years ago

Thanks, I found the issue. I rebuilt with libtorch 1.7, CUDA 11.0, and included ALL DLLs from libtorch distribution and CUDA distribution. Worked as expected.

yasenh commented 3 years ago

Good to hear that!