Open realwill opened 6 years ago
rotate_roi_pooling.cu has the same problem
@realwill can you provide detail information?
@mjq11302010044 F0102 10:11:56.269446 7269 syncedmem.hpp:31] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
@realwill Sorry to tell you that, the implementation of rotated_roi_align.cu is problematic, incorrect. Plz don't the the code.
About the cuda error, do you use the right gpu with caffe and nms function?
@mjq11302010044 yes, the same python and the same gpu, cuda without cudnn
@realwill cudnn5 recommended, or GPU memory will be be buggy on cuda8
when i compiling caffe, i encountered error below:
“C:\caffe\build\ALL_BUILD.vcxproj”(默认目标) (1) ->
“C:\caffe\build\src\caffe\caffe.vcxproj”(默认目标) (3) ->
(CustomBuild 目标) ->
C:/caffe/src/caffe/layers/rotate_roi_align_layer.cu(147) (col. 11) : error : calling a host function("fmax<float,
double> ") from a global function("caffe::RotateROIAlignForward
@YanShuang17 Jush remove rotate_roi_align_layer.cu
@mjq11302010044 thanks to reply! i followed your tip and removed rotate_roi_align_layer.cu, compiled caffe again , another error coours in rotate_roi_pooling_layer.cpp:
“C:\caffe\build\ALL_BUILD.vcxproj”(默认目标) (1) ->
“C:\caffe\build\src\caffe\caffe.vcxproj”(默认目标) (3) ->
(ClCompile 目标) ->
C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2672: 'max': no matching overloaded function found [C:\caffe\build\src\caffe\caffe.vcxproj]
C:\caffe\src\caffe\layers\rotate_roi_pooling_layer.cpp(110): error C2780: 'const _Ty &std::max(const _Ty &,const _Ty &,_Pr) noexcept(
other base software environments should be no problem, cuda8.0, cudnn5.1, Anaconda2(python2.7.13)... before it , i have just succeeded in compiling Faster rcnn's caffe
whether should i remove the related code of rotate_roi_align_layer.cu in caffe.proto and fast_rcnn_layers.hpp or not?
what i compiled is the caffe source file cloned in https://github.com/BVLC/caffe/tree/windows, yes my os is win10, i copied rotate_roi_pooling_layer.cpp/.cu ... 8 files and fast_rcnn_layers.hpp to the folder in the caffe i cloned and i runed build_win.cmd scripts.
@YanShuang17 Oh, yes. Sorry to remind you to remove all the dependencies of the roi_align, but it's wired that it shows mistakes..( I didn't try to compile in win10 before, so I am not sure with these problems.
@mjq11302010044 OK! after i removed all the dependencies of the roi_align, error still exist, may be win10 is the reason of error, thank you all the same!
@mjq11302010044 能否加qq或者微信?
@mjq11302010044 it is the same error (Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered) after compiled with cudnn.
@realwill make sure you have enough GPU memory, RRPN needs ~5G to train in a standard settings
@mjq11302010044 of course, the gpu memory is 24G of tesla P40
@realwill some update of roi pooling is submitted to the project, please check if this can help solve your problems : )
@mjq11302010044 thanks, i will try. BTW, it's two slow...
@mjq11302010044 the same error.
F0117 17:47:45.922055 56410 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
@realwill The problem is not rotate_roi_pooling_layer.cu, models/rrpn/RES101/faster_rcnn_end2end/train.prototxt is wrong. I feel stange this part should be removed.
If you really want to use Resnet, here is the solution: A) you can download an official resnet101 prototxt. B) copy models/rrpn/VGG16/faster_rcnn_end2end/train.prototxt -> models/rrpn/RES101/faster_rcnnend2end/train.prototxt C) Res101[ conv1 - conv4 ] -> VGG16[conv1-conv5] . D) change last conv4### to conv5_3 E) Unfortunately, only set C.TRAIN.SCALES = 600 , C.TRAIN.MAX_SIZE = 1000 will use 10.9G. Perhaps tesla P40 can run in large scales or you can use Res50.
One thing I should remind you is that we haven't run Res101 on ICDAR or other datasets, so I am not sure whether it can get a better performance than VGG16 can do.
hello, can I communicate with you by qq:1323369151 , please help me , thank you so much !@realwill @shaowy @YanShuang17
@realwill 请问你解决这个问题了吗
我也出现了这个问题,是rotate_roi_pooling的forward过程,有很大一部分argmax_data[index]并没有被赋值(既不为-1也不为maxidx),导致在backward过程中的argamx_data[index]返回了一个野的bottom_index使得bottom_diff[bottom_index]越界,抛出 an illegal memory was access的错误,目前我还没有解决这个问题,希望大家可以一起看一看
@mjq11302010044 @realwill @shaowy
我也出现了这个问题,是rotate_roi_pooling的forward过程,有很大一部分argmax_data[index]并没有被赋值(既不为-1也不为maxidx),导致在backward过程中的argamx_data[index]返回了一个野的bottom_index使得bottom_diff[bottom_index]越界,抛出 an illegal memory was access的错误,目前我还没有解决这个问题,希望大家可以一起看一看
同学,你解决了吗?@idealwei
pytorch放了rotated roialign,可以参考下
请问你解决了吗 @idealwei
hi @mjq11302010044 , still issues with rotated_roi_align_layer.cu? Any hint if available at other link working in cuda?
in the forward function, illegal memory access was encountered.