there is wrong in rotate_roi_align.cu

realwill commented 6 years ago

in the forward function, illegal memory access was encountered.

realwill commented 6 years ago

rotate_roi_pooling.cu has the same problem

mjq11302010044 commented 6 years ago

@realwill can you provide detail information？

realwill commented 6 years ago

@mjq11302010044 F0102 10:11:56.269446 7269 syncedmem.hpp:31] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered

mjq11302010044 commented 6 years ago

@realwill Sorry to tell you that, the implementation of rotated_roi_align.cu is problematic, incorrect. Plz don't the the code.

About the cuda error, do you use the right gpu with caffe and nms function？

realwill commented 6 years ago

@mjq11302010044 yes, the same python and the same gpu, cuda without cudnn

mjq11302010044 commented 6 years ago

@realwill cudnn5 recommended, or GPU memory will be be buggy on cuda8

YanShuang17 commented 6 years ago

when i compiling caffe, i encountered error below: “C:\caffe\build\ALL_BUILD.vcxproj”(默认目标) (1) -> “C:\caffe\build\src\caffe\caffe.vcxproj”(默认目标) (3) -> (CustomBuild 目标) -> C:/caffe/src/caffe/layers/rotate_roi_align_layer.cu(147) (col. 11) : error : calling a host function("fmax<float, double> ") from a global function("caffe::RotateROIAlignForward ") is not allowed [C:\caffe\build\src\caffe \caffe.vcxproj] C:/caffe/src/caffe/layers/rotate_roi_align_layer.cu(148) (col. 11) : error : calling a host function("fmin<float, double> ") from a global function("caffe::RotateROIAlignForward ") is not allowed [C:\caffe\build\src\caffe \caffe.vcxproj] C:/caffe/src/caffe/layers/rotate_roi_align_layer.cu(149) (col. 11) : error : calling a host function("fmax<float, double> ") from a global function("caffe::RotateROIAlignForward ") is not allowed [C:\caffe\build\src\caffe \caffe.vcxproj] C:/caffe/src/caffe/layers/rotate_roi_align_layer.cu(150) (col. 11) : error : calling a host function("fmin<float, double> ") from a global function("caffe::RotateROIAlignForward ") is not allowed [C:\caffe\build\src\caffe \caffe.vcxproj]

mjq11302010044 commented 6 years ago

@YanShuang17 Jush remove rotate_roi_align_layer.cu

YanShuang17 commented 6 years ago

@mjq11302010044 thanks to reply! i followed your tip and removed rotate_roi_align_layer.cu, compiled caffe again , another error coours in rotate_roi_pooling_layer.cpp:

“C:\caffe\build\ALL_BUILD.vcxproj”(默认目标) (1) -> “C:\caffe\build\src\caffe\caffe.vcxproj”(默认目标) (3) -> (ClCompile 目标) -> C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2672: 'max': no matching overloaded function found [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2780: 'const _Ty &std::max(const _Ty &,const _Ty &,_Pr) noexcept()': expects 3 arguments - 2 provided [C:\caffe\ build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2784: '_Ty std::max(std::initializer_list<_Elem>,_Pr)': could not deduce template argument for 'std::initializer_list< _Elem>' from 'float' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2782: 'const _Ty &std::max(const _Ty &,const _Ty &) noexcept()': template parameter '_Ty' is ambiguous [C:\caffe \build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2784: 'const _Ty &std::max(const _Ty &,const Ty &) noexcept()': could not deduce template argument for 'const Ty &' from 'double' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2780: '_Ty std::max(std::initializer_list<_Elem>)': expects 1 arguments - 2 provided [C:\caffe\build\src\caffe\caffe.v cxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2672: 'min': no matching overloaded function found [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2780: 'const _Ty &std::min(const _Ty &,const _Ty &,_Pr) noexcept()': expects 3 arguments - 2 provided [C:\caffe\ build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2784: '_Ty std::min(std::initializer_list<_Elem>,_Pr)': could not deduce template argument for 'std::initializer_list< _Elem>' from 'float' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2782: 'const _Ty &std::min(const _Ty &,const _Ty &) noexcept()': template parameter '_Ty' is ambiguous [C:\caffe \build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2784: 'const _Ty &std::min(const _Ty &,const Ty &) noexcept()': could not deduce template argument for 'const Ty &' from 'double' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(111): error C2780: '_Ty std::min(std::initializer_list<_Elem>)': expects 1 arguments - 2 provided [C:\caffe\build\src\caffe\caffe.v cxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2672: 'max': no matching overloaded function found [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2780: 'const _Ty &std::max(const _Ty &,const _Ty &,_Pr) noexcept()': expects 3 arguments - 2 provided [C:\caffe\ build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2784: '_Ty std::max(std::initializer_list<_Elem>,_Pr)': could not deduce template argument for 'std::initializer_list< _Elem>' from 'float' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2782: 'const _Ty &std::max(const _Ty &,const _Ty &) noexcept()': template parameter '_Ty' is ambiguous [C:\caffe \build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2784: 'const _Ty &std::max(const _Ty &,const Ty &) noexcept()': could not deduce template argument for 'const Ty &' from 'double' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(112): error C2780: '_Ty std::max(std::initializer_list<_Elem>)': expects 1 arguments - 2 provided [C:\caffe\build\src\caffe\caffe.v cxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2672: 'min': no matching overloaded function found [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2780: 'const _Ty &std::min(const _Ty &,const _Ty &,_Pr) noexcept()': expects 3 arguments - 2 provided [C:\caffe\ build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2784: '_Ty std::min(std::initializer_list<_Elem>,_Pr)': could not deduce template argument for 'std::initializer_list< _Elem>' from 'float' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2782: 'const _Ty &std::min(const _Ty &,const _Ty &) noexcept()': template parameter '_Ty' is ambiguous [C:\caffe \build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2784: 'const _Ty &std::min(const _Ty &,const Ty &) noexcept()': could not deduce template argument for 'const Ty &' from 'double' [C:\caffe\build\src\caffe\caffe.vcxproj] C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(113): error C2780: '_Ty std::min(std::initializer_list<_Elem>)': expects 1 arguments - 2 provided [C:\caffe\build\src\caffe\caffe.v cxproj]

YanShuang17 commented 6 years ago

other base software environments should be no problem, cuda8.0, cudnn5.1, Anaconda2(python2.7.13)... before it , i have just succeeded in compiling Faster rcnn's caffe

YanShuang17 commented 6 years ago

whether should i remove the related code of rotate_roi_align_layer.cu in caffe.proto and fast_rcnn_layers.hpp or not?

YanShuang17 commented 6 years ago

what i compiled is the caffe source file cloned in https://github.com/BVLC/caffe/tree/windows, yes my os is win10, i copied rotate_roi_pooling_layer.cpp/.cu ... 8 files and fast_rcnn_layers.hpp to the folder in the caffe i cloned and i runed build_win.cmd scripts.

mjq11302010044 commented 6 years ago

@YanShuang17 Oh, yes. Sorry to remind you to remove all the dependencies of the roi_align, but it's wired that it shows mistakes..( I didn't try to compile in win10 before, so I am not sure with these problems.

YanShuang17 commented 6 years ago

@mjq11302010044 OK! after i removed all the dependencies of the roi_align, error still exist, may be win10 is the reason of error, thank you all the same!

realwill commented 6 years ago

@mjq11302010044 能否加qq或者微信？

realwill commented 6 years ago

@mjq11302010044 it is the same error (Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered) after compiled with cudnn.

mjq11302010044 commented 6 years ago

@realwill make sure you have enough GPU memory, RRPN needs ~5G to train in a standard settings

realwill commented 6 years ago

@mjq11302010044 of course, the gpu memory is 24G of tesla P40

mjq11302010044 commented 6 years ago

@realwill some update of roi pooling is submitted to the project, please check if this can help solve your problems : )

realwill commented 6 years ago

@mjq11302010044 thanks, i will try. BTW, it's two slow...

realwill commented 6 years ago

@mjq11302010044 the same error.
F0117 17:47:45.922055 56410 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered

shaowy commented 6 years ago

@realwill The problem is not rotate_roi_pooling_layer.cu, models/rrpn/RES101/faster_rcnn_end2end/train.prototxt is wrong. I feel stange this part should be removed.

If you really want to use Resnet, here is the solution: A) you can download an official resnet101 prototxt. B) copy models/rrpn/VGG16/faster_rcnn_end2end/train.prototxt -> models/rrpn/RES101/faster_rcnnend2end/train.prototxt C) Res101[ conv1 - conv4 ] -> VGG16[conv1-conv5] . D) change last conv4### to conv5_3 E) Unfortunately, only set C.TRAIN.SCALES = 600 , C.TRAIN.MAX_SIZE = 1000 will use 10.9G. Perhaps tesla P40 can run in large scales or you can use Res50.

One thing I should remind you is that we haven't run Res101 on ICDAR or other datasets, so I am not sure whether it can get a better performance than VGG16 can do.

19931991 commented 6 years ago

hello, can I communicate with you by qq:1323369151 , please help me , thank you so much ！@realwill @shaowy @YanShuang17

lxy443626128 commented 6 years ago

@realwill 请问你解决这个问题了吗

idealwei commented 6 years ago

我也出现了这个问题，是rotate_roi_pooling的forward过程，有很大一部分argmax_data[index]并没有被赋值(既不为-1也不为maxidx)，导致在backward过程中的argamx_data[index]返回了一个野的bottom_index使得bottom_diff[bottom_index]越界，抛出 an illegal memory was access的错误，目前我还没有解决这个问题，希望大家可以一起看一看

idealwei commented 6 years ago

@mjq11302010044 @realwill @shaowy

huishaoli commented 5 years ago

我也出现了这个问题，是rotate_roi_pooling的forward过程，有很大一部分argmax_data[index]并没有被赋值(既不为-1也不为maxidx)，导致在backward过程中的argamx_data[index]返回了一个野的bottom_index使得bottom_diff[bottom_index]越界，抛出 an illegal memory was access的错误，目前我还没有解决这个问题，希望大家可以一起看一看

同学,你解决了吗?@idealwei

realwill commented 5 years ago

pytorch放了rotated roialign，可以参考下

husthkk commented 5 years ago

请问你解决了吗 @idealwei

mamunir commented 4 years ago

hi @mjq11302010044 , still issues with rotated_roi_align_layer.cu? Any hint if available at other link working in cuda?

mjq11302010044 / RRPN

there is wrong in rotate_roi_align.cu #9