zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.21k stars 1.27k forks source link

Problem with tracing the model #427

Open victor-yudin opened 4 years ago

victor-yudin commented 4 years ago

I'm trying to trace the model to use it later in c++ code by torch::jit::load(), but getting an error. Before running the tracing I modified the backbone.py and efficientdet/model.py like in #111

Here is my code:

import torch
from backbone import EfficientDetBackbone

device = torch.device('cuda')
model = EfficientDetBackbone(onnx_export=True)

model = model.to(device)
model.eval()

sample = torch.rand(1, 3, 512, 512).to(device)
traced_model = torch.jit.trace(model, sample)

The error:

RuntimeError: output 1 ((1,.,.) = 
   -12.0000   -12.0000    20.0000    20.0000
    -7.2000   -18.4000    15.2000    26.4000
   -18.4000    -7.2000    26.4000    15.2000
   -16.1587   -16.1587    24.1587    24.1587
   -10.1111   -24.2222    18.1111    32.2222
   -24.2222   -10.1111    32.2222    18.1111
   ... 
   163.5377  -120.9245   732.4623  1016.9246
   -120.9245   163.5377  1016.9246   732.4623
[ CUDAFloatType{1,49104,4} ]) of traced region did not have observable data dependence with trace inputs; this probably indicates your program cannot be understood by the tracer.

Common pretrained models, e.g. resnet50, are traced without errors. What's wrong?

sainttelant commented 4 years ago

any idea about this issue?

victor-yudin commented 4 years ago

any idea about this issue?

Tracing doesn't work in last master commits. Here is the part of code from working commit: https://drive.google.com/file/d/173Quf-Arhg1BBBxtu8KYsw3PJY1GPU0n/view?usp=sharing. I don't know it's hash, but you can try to find it in the repo if needed.

The code to run tracing:

import torch
from mymodels import EfficientDet as EfficientDetBackbone

model = EfficientDetBackbone(num_classes=9, compound_coef=4)
model.load_state_dict(torch.load("logs/efficientdet-d4_18_36000.pth"), strict=False)

model.backbone_net.model.set_swish(memory_efficient=False) # False

model = model.to(device)
model.eval()

sample = torch.rand(1, 3, 1024, 1024).to(device)
traced_pre_model = torch.jit.trace(model, sample)

traced_pre_model.eval()
traced_pre_model.save('logs/efficientdet-d4_18_36000_new.pth')

It works with warnings. I don't know will it work correctly in inference, but model is saved somehow.

sainttelant commented 4 years ago

@victor-yudin
thanks for your suggestion firstly,
i 've tried to traced the model tested with ./weights/efficientdet-d0.pth, the origin model d0. the code is followed as :

def trace_module(): compound_coef = 0 num_classes =90 ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)] scales = [1, 1.2599210498948732, 1.5874010519681994] model = EfficientDetBackbone(num_classes=90, compound_coef=compound_coef, ratios=ratios, scales=scales) model.load_state_dict(torch.load('./weights/efficientdet-d0.pth'), strict=False)

model.backbone_net.model.set_swish(memory_efficient=False) # False
device = torch.device('cuda')
model = model.to(device)
model.eval()

sample = torch.rand(1, 3, 1024, 1024).to(device)
traced_pre_model = torch.jit.trace(model, sample)

traced_pre_model.eval()
traced_pre_model.save("./weights/ModuleForCpp.pt")
print("trace successful")

trace_model()

however, it still couldn't work, the errors are : 780.8000 601.6000 1139.2000 1318.4000 601.6000 780.8000 1318.4000 1139.2000 637.4602 637.4602 1282.5398 1282.5398 734.2222 508.4443 1185.7778 1411.5557 508.4443 734.2222 1411.5557 1185.7778 553.6253 553.6253 1366.3746 1366.3746 675.5377 391.0755 1244.4623 1528.9246 391.0755 675.5377 1528.9246 1244.4623 [ CUDAFloatType{1,196416,4} ]) of traced region did not have observable data dependence with trace inputs; this probably indicates your program cannot be understood by the tracer.

victor-yudin commented 4 years ago

@victor-yudin thanks for your suggestion firstly, i 've tried to traced the model tested with ./weights/efficientdet-d0.pth, the origin model d0. the code is followed as :

def trace_module(): compound_coef = 0 num_classes =90 ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)] scales = [1, 1.2599210498948732, 1.5874010519681994] model = EfficientDetBackbone(num_classes=90, compound_coef=compound_coef, ratios=ratios, scales=scales) model.load_state_dict(torch.load('./weights/efficientdet-d0.pth'), strict=False)

model.backbone_net.model.set_swish(memory_efficient=False) # False
device = torch.device('cuda')
model = model.to(device)
model.eval()

sample = torch.rand(1, 3, 1024, 1024).to(device)
traced_pre_model = torch.jit.trace(model, sample)

traced_pre_model.eval()
traced_pre_model.save("./weights/ModuleForCpp.pt")
print("trace successful")

trace_model()

however, it still couldn't work, the errors are : 780.8000 601.6000 1139.2000 1318.4000 601.6000 780.8000 1318.4000 1139.2000 637.4602 637.4602 1282.5398 1282.5398 734.2222 508.4443 1185.7778 1411.5557 508.4443 734.2222 1411.5557 1185.7778 553.6253 553.6253 1366.3746 1366.3746 675.5377 391.0755 1244.4623 1528.9246 391.0755 675.5377 1528.9246 1244.4623 [ CUDAFloatType{1,196416,4} ]) of traced region did not have observable data dependence with trace inputs; this probably indicates your program cannot be understood by the tracer.

Did you import the EfficientDet from my google drive link (from mymodels)? Also you need to set input size equal to 512 if you're using efficientdet-d0 model: sample = torch.rand(1, 3, 512, 512).to(device).

sainttelant commented 4 years ago

@victor-yudin , thanks for your help, i 've sucessfully converted model to .pt file.
i have tired to load the pt model in c++ platform compiled the project with visual studio 2017 using x64 debug model.

the libtorch library is 1.6.0 stable version installed in VS, however, the traced .pt model was converted via pytorch 1.4.0, as far as i was concerned, the inference should work well after all the torch.lib is backward compatibility as i think. finally, the program was built sucessfully, but, the inference ran error after the .pt file loaded successfully

sainttelant commented 4 years ago

the part c++ code are

void Classfier(cv::Mat &image) { torch::Tensor img_tensor = torch::from_blob(image.data, { 1, image.rows, image.cols, 3 }, torch::kByte); img_tensor = img_tensor.permute({ 0, 3, 1, 2 }); img_tensor = img_tensor.toType(torch::kFloat); img_tensor = img_tensor.div(255); //std::shared_ptr module = torch::jit::load("../Train/resnet.pt"); torch::jit::script::Module module = torch::jit::load("ModuleForCppScale.pt");

//torch::Tensor output = module->forward({ img_tensor }).toTensor();
torch::Tensor output = module.forward({ img_tensor }).toTensor();
auto max_result = output.max(1, true);
auto max_index = std::get<1>(max_result).item<float>();
std::cout << max_index << std::endl;

}

int main() { // TorchTest(); cv::Mat image = cv::imread("dog.jpg"); /cv::imshow("tupian", image); cv::waitKey(100);/ cv::resize(image, image, cv::Size(224, 224)); std::cout << image.rows << " " << image.cols << " " << image.channels() << std::endl; Classfier(image); return 0; }

sainttelant commented 4 years ago

when the program ran to the line of

"torch::Tensor output = module.forward({ img_tensor }).toTensor();"

it crashed after a very long time , i assumed that it consumed approximately 10mins, and it took a long time to load *pt file as well, i have no idea about this......

victor-yudin commented 4 years ago

the part c++ code are

void Classfier(cv::Mat &image) { torch::Tensor img_tensor = torch::from_blob(image.data, { 1, image.rows, image.cols, 3 }, torch::kByte); img_tensor = img_tensor.permute({ 0, 3, 1, 2 }); img_tensor = img_tensor.toType(torch::kFloat); img_tensor = img_tensor.div(255); //std::shared_ptrtorch::jit::script::Module module = torch::jit::load("../Train/resnet.pt"); torch::jit::script::Module module = torch::jit::load("ModuleForCppScale.pt");

//torch::Tensor output = module->forward({ img_tensor }).toTensor();
torch::Tensor output = module.forward({ img_tensor }).toTensor();
auto max_result = output.max(1, true);
auto max_index = std::get<1>(max_result).item<float>();
std::cout << max_index << std::endl;

}

int main() { // TorchTest(); cv::Mat image = cv::imread("dog.jpg"); /cv::imshow("tupian", image); cv::waitKey(100);/ cv::resize(image, image, cv::Size(224, 224)); std::cout << image.rows << " " << image.cols << " " << image.channels() << std::endl; Classfier(image); return 0; }

  1. efficientdet-d0 - object detection model, not for classification.

  2. You need first to preprocess the img:

    cv::resize(img, img, cv::Size(512, 512));
    cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
    img.convertTo(img, CV_32FC3, 1.0 / 255, 0);

    also normalize img - subtruct mean and divide then by std

  3. Switch model to eval mode after loading to not calculate gradients on the inference. (it's for ur long time calc issue):

    torch::NoGradGuard no_grad;
    torch::jit::script::Module model = torch::jit::load(modelPath);
    model.eval();
  4. optionally, load model to gpu

    model.to(device);
  5. Order of input dims:

    torch::Tensor tensor_img = torch::from_blob(img.data, {1, 3, img.rows, img.cols});
    tensor_img = tensor_img.to(device);  // optionally, as for model
  6. forward:

    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(tensor_img);
    auto outputs = model.forward(inputs).toTuple();
  7. as model returns 4 tensors, you need to parse each of it

    torch::Tensor regression = outputs->elements()[1].toTensor().cpu();
    torch::Tensor classification = outputs->elements()[2].toTensor().cpu();
    torch::Tensor anchors = outputs->elements()[3].toTensor().cpu();
  8. then do postprocess like in inference code in this repo, but on c++

I've just written almost all code for you, now is your turn:)

Now I have got the problems with wrong outputs from model.forward(): regression boxes are too large (order of 1е13) scores type is float, but they all are equals to 0 or 1, instead of 0.777, 0.912, ... If you overcome it, let me know how, pls.

sainttelant commented 4 years ago

okay, i will try it later , you are awesome, i will let u know after i work on it

sainttelant commented 4 years ago

include <torch/torch.h>

include <torch/script.h>

include

include

include <opencv2/highgui.hpp>

include <opencv2/core/core.hpp>

include <opencv2/opencv.hpp>

using namespace std;

bool imagePreprocess(cv::Mat &m_origin, cv::Mat &m_outimage, int m_ScaleSize) { // pic preprocess if (m_origin.empty()) { return false; } cv::Mat tmp_img; cv::cvtColor(m_origin, tmp_img, cv::COLOR_BGR2RGB); cv::resize(tmp_img, tmp_img, cv::Size(m_ScaleSize, m_ScaleSize)); tmp_img.convertTo(m_outimage, CV_32F, 1.0 / 255); return true; }

void permuteEfficient(torch::Tensor &m_tensors, cv::Mat &img_float, int input_image_size, torch::Device &device) { m_tensors = torch::from_blob(img_float.data, { 1, input_image_size, input_image_size, 3 }).to(device); m_tensors = m_tensors.permute({ 0,3,1,2 }); }

int main() { // TorchTest(); cv::Mat image = cv::imread("dog.jpg"); cv::Mat outMat; imagePreprocess(image, outMat, 512); torch::Tensor m_tensors; torch::Device m_device(torch::kCPU); permuteEfficient(m_tensors, outMat, 512, m_device); // normalization m_tensors[0][0] = mtensors[0][0].sub(0.406).div_(0.225); m_tensors[0][1] = mtensors[0][1].sub(0.456).div_(0.224); m_tensors[0][2] = mtensors[0][2].sub(0.485).div_(0.229); m_tensors = m_tensors.to(at::kCPU); //load model torch::NoGradGuard no_grad; torch::jit::script::Module module; try { // Deserialize the ScriptModule from a file using torch::jit::load(). module = torch::jit::load("ScaleModuleForCpp.pt"); std::cerr << "load model success!\n"; } catch (const c10::Error& e) { std::cerr << "error loading the model\n"; return -1; } //module.eval(); module.to(at::kCPU);

// forward
    std::vector<torch::jit::IValue> m_inputs;

   auto outputs = module.forward(m_inputs).toTuple();
   torch::Tensor regression = outputs->elements()[1].toTensor().cpu();
   torch::Tensor classification = outputs->elements()[2].toTensor().cpu();
   torch::Tensor anchors = outputs->elements()[3].toTensor().cpu();

/* torch::Tensor outputs = module.forward({ m_tensors }).toTensor();

auto results = outputs.sort(-1, true);
auto softmaxs = std::get<0>(results)[0].softmax(0);
auto indexs = std::get<1>(results)[0];*/

// postproprocess

system("pause");
return 0;

}

sainttelant commented 4 years ago

@victor-yudin when i executed the program, i got error in the line of" model.forward():" as well,

i 've tried other models (yolov3, squeezenet )what i traced from other programs, they ran very well, i assume that there are some errors when traced the model to *pt file somewhere.. probably.

sainttelant commented 4 years ago

have you ever worked it on yet? plz let me know, i will be appreciated so much

laplaceson commented 3 years ago

any idea about this issue?

Tracing doesn't work in last master commits. Here is the part of code from working commit: https://drive.google.com/file/d/173Quf-Arhg1BBBxtu8KYsw3PJY1GPU0n/view?usp=sharing. I don't know it's hash, but you can try to find it in the repo if needed.

The code to run tracing:

import torch
from mymodels import EfficientDet as EfficientDetBackbone

model = EfficientDetBackbone(num_classes=9, compound_coef=4)
model.load_state_dict(torch.load("logs/efficientdet-d4_18_36000.pth"), strict=False)

model.backbone_net.model.set_swish(memory_efficient=False) # False

model = model.to(device)
model.eval()

sample = torch.rand(1, 3, 1024, 1024).to(device)
traced_pre_model = torch.jit.trace(model, sample)

traced_pre_model.eval()
traced_pre_model.save('logs/efficientdet-d4_18_36000_new.pth')

It works with warnings. I don't know will it work correctly in inference, but model is saved somehow.

Hi, mann. Do you solve the all zero and one issue? I also face this problem now... Any suggest would be helpful, thanks

zylo117 commented 3 years ago

Actually, torch c++ is not production-ready, in both speed and stability, but more like a toy to me.

opeide commented 1 year ago

For anyone having similar issues, I had trouble tracing until I added a torch.jit.is_tracing() check in Anchor's forward to not use last_anchors during tracing.