msracver / Deformable-ConvNets

Deformable Convolutional Networks
MIT License
4.04k stars 959 forks source link

Use FPN but without deformable ops #218

Open EyreEyre opened 5 years ago

EyreEyre commented 5 years ago

I'm trying to use FPN based on resnet-101, but set deformable convolution and pooling to False. I don't use the two modules at present.

Some settings are as below. dataset: ImageNet VID SCALES:

RPN_FEAT_STRIDE:

ANCHOR_RATIOS:

ANCHOR_SCALES:

BATCH_ROIS_OHEM: 128 RPN_NMS_THRESH: 0.7 RPN_PRE_NMS_TOP_N: 12000 RPN_POST_NMS_TOP_N: 2000

How can I solve the problem? Thanks. Here is error.

terminate called after throwing an instance of 'dmlc::Error' what(): [22:58:55] src/engine/./threaded_engine.h:359: [22:58:55] /home/wfc/code/Deformable-ConvNets/external/mxnet/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: AddTakeGradLargeBatch[70528,1], [32,1,1]

Stack trace returned 10 entries: [bt] (0) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7fe0f7bb7faa] [bt] (1) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fe0f7bb8b48] [bt] (2) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mshadow::cuda::CheckLaunchParam(dim3, dim3, char const)+0x1a0) [0x7fe0fa7c00b0] [bt] (3) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::op::AddTakeGradLargeBatch<int, float>(mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::Tensor<mshadow::gpu, 1, int> const&, mshadow::Tensor<mshadow::gpu, 1, int> const&, mshadow::Tensor<mshadow::gpu, 2, float> const&, mshadow::Tensor<mshadow::gpu, 1, char>)+0x830) [0x7fe0faba6400] [bt] (4) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::op::AddTakeGradLargeBatchCaller<mshadow::gpu, float, float>(mxnet::OpContext const&, mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::Tensor<mshadow::gpu, 1, float> const&, mshadow::Tensor<mshadow::gpu, 2, float> const&)+0x36e) [0x7fe0fabfd61e] [bt] (5) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::op::TakeOpBackward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::vector<mxnet::TBlob, std::allocator > const&)+0x305df) [0x7fe0fac2de4f] [bt] (6) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x69) [0x7fe0fa6a9519] [bt] (7) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(+0x33aa1d0) [0x7fe0fa67c1d0] [bt] (8) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x93) [0x7fe0fa5f8073] [bt] (9) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptr)+0xcb) [0x7fe0fa60039b]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 9 entries: [bt] (0) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7fe0f7bb7faa] [bt] (1) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fe0f7bb8b48] [bt] (2) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x332) [0x7fe0fa5f8312] [bt] (3) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptr)+0xcb) [0x7fe0fa60039b] [bt] (4) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptr), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr&&)+0x63) [0x7fe0fa600593] [bt] (5) /home/wfc/lib/anaconda2/envs/dcnn/lib/python2.7/site-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr)> (std::shared_ptr)> >::_M_run()+0x4a) [0x7fe0fa5fa76a] [bt] (6) /home/wfc/lib/anaconda2/envs/dcnn/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7fe0a0a88c5c] [bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fe10abd46ba] [bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fe10a1fa41d]

Process finished with exit code 134