pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.73k stars 6.88k forks source link

Error handling for operators, C++ library. #2779

Closed thisisi3 closed 3 years ago

thisisi3 commented 3 years ago

When I try to use roi_align, it exits without printing any message. The error is not caught as std::exception &/* but caught in catch(...). But I can not find any message about the error, std::current_exception() won't do the job either. Note that I am windows, do not know if it matters. Thanks.

cc @peterjc123 @nbcsm @guyang3532 @maxluk @gunandrose4u @smartcat2010 @mszhanyi

vfdev-5 commented 3 years ago

@thisisi3 could you please provide details on your environment: compiler version, pytorch version, torchvision version and a code snippet to reproduce your issue. Thanks

thisisi3 commented 3 years ago

nvcc: release 10.1, V10.1.105 cl: 19.16.27043 libtorch: cu101-debug-1.5.0 torchvision: 0.6.0

-----------------below is the code---------------------

#include <torch/torch.h>
#include <iostream>
#include <exception>
#include <stdexcept>
#include <torchvision/nms.h>
#include <torchvision/ROIAlign.h>

void test_roi_align() {
  torch::Tensor input = torch::rand({12, 256, 100, 100});
  torch::Tensor rois  = torch::rand({1, 4}) * 50;
  std::cout << "rois: " << rois << std::endl;
  std::cout << "test cpu roi_align\n";
  auto res = roi_align(input, rois, 1.0, 7, 7, -1, false);
  std::cout << "shape of result:" << res.sizes() << std::endl;
  torch::Device cuda(torch::kCUDA);
  input = input.to(cuda);
  rois = rois.to(cuda);
  std::cout << "next test gpu align\n";
  res = roi_align(input, rois, 1.0, 7, 7, -1, false);
  std::cout << res.sizes() << std::endl;
}

void handle_eptr(std::exception_ptr eptr)
{
  std::cout << "in eptr handler\n";
  try{
    if (eptr){
      std::rethrow_exception(eptr);
    } else {
      std::cout << "null ptr\n";
    }
  } catch(const std::exception &e){
    std::cout << "as std::exception" << std::endl;
    std::cout << "Caught " << e.what() << "\n";
  }
}

int main() {
  std::exception_ptr eptr;
  try {
    test_roi_align();
  } catch (std::exception &e){
    std::cout << "std::exception" << std::endl;
    std::cout << e.what() << std::endl;
  } catch (...) {
    std::cout << "unknown error\n";
    eptr = std::current_exception();
    std::cout << "caught current exception\n";
  }
  handle_eptr(eptr);
}  

Note that it does not always produce error. It has error when the randomly generated roi is ill-formatted, e.g. x_min > x_max.

fmassa commented 3 years ago

Hi,

The issue is that the C++ function expects rois to be a Nx5 tensor. We need to add a better shape check to the C++ side, as we currently only have it on python, see https://github.com/pytorch/vision/pull/1968 for more details.

@vfdev-5 will be working on adding checks on C++ / CUDA as well, so that those types of errors are easily catched.

thisisi3 commented 3 years ago

thanks for clearing it up, i guess the c++ impl is more raw and needs more attention to the potential uncatchable errors.

fmassa commented 3 years ago

@thisisi3 that's correct, our C++ implementations are a bit more raw and we have been traditionally focusing most on the Python API. But with time we will be improving the C++ API as well.

peterjc123 commented 3 years ago

@fmassa @vfdev-5 Is this issue fixed?