tusen-ai / simpledet

A Simple and Versatile Framework for Object Detection and Instance Recognition
Apache License 2.0
3.08k stars 488 forks source link

[BUG] Segmentation Fault in proposal.cc #248

Open rohun-tripathi opened 5 years ago

rohun-tripathi commented 5 years ago

The code breaks if the proposal.cc is used to run NMS. This issue comes up when running the detection_test.py on the cpu.

On using a GPU, the code runs successfully, matching the accuracy provided for the pretrained models.

Stack trace - [bt] (0) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3d9f5c9) [0x7fa37465f5c9] [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fa3c578a4b0] [bt] (2) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::ProposalOp::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::vector<mxnet::TBlob, std::allocator > const&, std::vector<mxnet::TBlob, std::allocator > const&)+0xf9f) [0x7fa3739b17ef] [bt] (3) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::OperatorState::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::vector<mxnet::TBlob, std::allocator > const&)+0xac1) [0x7fa373e4a6e1] [bt] (4) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::exec::StatefulComputeExecutor::Run(mxnet::RunContext, bool)+0x76) [0x7fa3745aae76] [bt] (5) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3cb3def) [0x7fa374573def] [bt] (6) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x995) [0x7fa3744aa545] [bt] (7) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr&&)+0x118) [0x7fa3744c2de8] [bt] (8) /home/ubuntu/work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr)> (std::shared_ptr)> >::_M_run()+0x4a) [0x7fa3744a89ca]

Using the faster_r50v1c4_c5_512roi_1x.py configuration

Running the COCOval2017 dataset

Hardware info V100 GPU on AWS

Software info ubuntu 16.04; cuda10.0; cuDNN based on chosen cuda; nvidia 410.429

Using the setup from scratch instructions given on the install page - https://github.com/TuSimple/simpledet/blob/master/doc/INSTALL.md

As I said, the code breaks if the proposal.cc is used to run NMS. This issue comes up when running the detection_test.py on the cpu. On using a GPU, the code runs successfully, matching the accuracy provided for the pretrained models.

rohun-tripathi commented 5 years ago

The fault lies here - https://github.com/TuSimple/simpledet/blob/master/operator_cxx/contrib/proposal.cc#L373

I think the correct index to lookup is i + num_anchors based on the convention used in TuSimple and MXNet. A similar lookup is made here - https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/multi_proposal.cc#L372

I changed the lookup to i + num_anchors and now get the exact same output for a given input proposal set using proposal.cu and proposal.cc