pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.12k stars 6.94k forks source link

libtorch C++, fasterrcnn_resnet50_fpn module.forward() Assert #3349

Open dc986 opened 3 years ago

dc986 commented 3 years ago

🐛 Bug

module.forward() launches Debug assert

File: minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp Line: 966

Expression: __acrt_first_block == header

To Reproduce

Loaded scripted model with

torch::jit::script::Module module;
try {
    module = torch::jit::load(model_path);
}
catch (const c10::Error& e) {
    std::cerr << e.what();
    return -1;
}
module.eval();

Loaded image into tensor with

cv::Mat image; 
cv::Mat3f image_32fc3;

image = cv::imread(image_path, cv::IMREAD_COLOR);
auto h = image.rows;
auto w = image.cols;
auto c = image.channels();

image.convertTo(image_32fc3, CV_32FC3, 1.0f / 255.0f);
at::Tensor inputTensor = torch::from_blob(image_32fc3.data, { 1, h, w, c });
inputTensor = inputTensor.permute({ 0, 3, 1, 2 });
torch::DeviceType device_type = torch::kCPU;
inputTensor = inputTensor.to(device_type);

Both model and tensor seem to be loaded correctly anyway

std::vector<torch::jit::IValue>  input_to_net;
input_to_net.push_back(inputTensor);
at::Tensor output = module.forward(input_to_net).toTensor();

does not work.

call stack is: image

Environment

OS: Microsoft Windows 7 Professional Language: C++ CMake version: version 3.17.1 Python version: 3.7 (64-bit runtime) Is CUDA available: N/A numpy==1.18.5 torch==1.7.1+cpu torchaudio==0.7.2 torchvision==nightly Python version:

Additional context

cc @vfdev-5

dc986 commented 3 years ago

Same issue with maskrcnn_resnet50_fpn.

Any ideas?

bmanga commented 3 years ago

Are you using the debug versions of libtorch and torchvision?

dc986 commented 3 years ago

I've tried with both debug and release. But none of the two works

bmanga commented 3 years ago

Try passing in a list of tensor images(c x h x w) instead of a single tensor that contains a batch of images:

  auto imageList = c10::List<torch::Tensor>({imageTensors...});
  std::vector<torch::jit::IValue> inputs;
  inputs.emplace_back(imageList);

  torch::jit::IValue output = module.forward(inputs);
bmanga commented 3 years ago

For reference, this is what I use to convert a cv::Mat to a torch tensor:

torch::Tensor createImageTensor(const cv::Mat &image)
{
  cv::Mat rgbImage;
  cv::cvtColor(image, rgbImage, cv::COLOR_BGR2RGB);

  torch::Tensor tensorImage = torch::from_blob(
      rgbImage.data, {rgbImage.rows, rgbImage.cols, 3},
      torch::TensorOptions().dtype(torch::kByte).requires_grad(false));
  tensorImage = tensorImage.to(torch::kFloat);
  tensorImage /= 255.0;

  tensorImage = tensorImage.transpose(0, 1).transpose(0, 2).contiguous();
  return tensorImage;
}
dc986 commented 3 years ago

Thanks for you answer, this is now my code:

torch::Tensor t1 = createImageTensor(image);
torch::Tensor t2 = createImageTensor(image);

auto imageList = c10::List<torch::Tensor>({ t1, t2 });
std::vector<torch::jit::IValue> input_to_net;
input_to_net.emplace_back(imageList);

auto output = module.forward(input_to_net);

Unfortunately it gives me an assert again.

Looking at the call stack where the exception is thrown:

torch_cpu.dll!torch::jit::Module::forward(std::vector<c10::IValue,std::allocator<c10::IValue>> inputs)

In file "libtorch-win-shared-with-deps-debug-1.7.1+cpu\libtorch\include\torch\csrc\jit\api\module.h" Line 112

 IValue forward(std::vector<IValue> inputs) {
    return get_method("forward")(std::move(inputs));
  }

input.size() = 0

bmanga commented 3 years ago

Can you verify that you can correctly run the tracing test ?

dc986 commented 3 years ago

I've tried to run the tracing test. The output is the same I had with the image.

I've tried to download torchlib-nightly + vision-master and did again the same tracing test. This time the error is before, when I try to load the model I have the following errors: image

bmanga commented 3 years ago

Did you modify the source code of the test? It seems like it's trying to load a file called fasterrcnn_resnet50_fpn_1602_nightly.pth. Files with extension pth are not usually the scripted/traced ones.

dc986 commented 3 years ago

Yes, sorry, I've tried both. Extension .pt gives the same output. image

bmanga commented 3 years ago

You shouldn't have to modify the source code. the pt file is generated by the python file in the tracing directory, so make sure you run that one first.

dc986 commented 3 years ago

I've compiled the test using cmake, it runs, the model is correctly loaded and the forward gives no problem.

When I use the model traced with the test in Visual Studio I am back to the original issue, the inference does not work. I've set Include Directories, Library Directories and Linked Input. In the post build event the torch and torchvision .dll are copied where the executable files is.

bmanga commented 3 years ago

Can you share the python code you use to generate the torchscript file?

dc986 commented 3 years ago

This is my code

import cv2
import os, sys, time, datetime, random

from PIL import Image
from matplotlib import pyplot as plt

import torch
import torchvision

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
model.eval()

traced_model = torch.jit.script(model)
traced_model.save("my_fasterrcnn_resnet50_fpn.pt")
bmanga commented 3 years ago

That looks fine. If you can't correctly run your my_fasterrcnn_resnet50_fpn.pt in the tracing test, I'm out of ideas :/.

Aquapisces commented 2 years ago

Maybe I accidently find out a solution. I came across a similar problem like yours. orginal code: auto InputTensor = torch::from_blob(mGlobalCam_P.data, {1, mGlobalCam_P.rows, mGlobalCam_P.cols, 3 }, torch::kFloat); InputTensor = InputTensor.permute({ 0,3,1,2 });

the fine code: auto InputTensor = torch::from_blob(mGlobalCam_P.data, {1, mGlobalCam_P.rows, mGlobalCam_P.cols, 3 }, torch::kByte); InputTensor = InputTensor.permute({ 0,3,1,2 }).to(torch::kFloat); I don't know why. It seems that we'd better use kByte in from_blob. Tell me if this works for you or not.