I meet problem during implement light_head_rcnn - Githubissues

roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

MIT License

2.82k stars 544 forks source link

I meet problem during implement light_head_rcnn #45

Open hewumars opened 6 years ago

hewumars commented 6 years ago

loss_bbox is not converge.other loss(loss_cls,loss_rpn_cls,loss_bbox) is converge.can I push the code to you for debug.

Rizhiy commented 6 years ago

Hi, where did you take PSRoIPool layer from? A lot of PyTorch implementations of that layer are bugged. Also, they are probably implemented for single image batch and might not work with multiple images per batch.

hewumars commented 6 years ago

PSRoI_Align from https://github.com/zengarden/light_head_rcnn PSRoIPooling from https://github.com/PureDiors/pytorch_RFCN
I set batchsize=1 when trianing light_head_rcnn. the codes seem to be able to work with multiple images per batch,but at least single image per batch can work.

Rizhiy commented 6 years ago

I'm pretty sure PSRoIPooling in that repo is bugged, see: https://github.com/PureDiors/pytorch_RFCN/issues/4.

hewumars commented 6 years ago

light head rcnn model also is not converge use PSRoI_Align from https://github.com/zengarden/light_head_rcnn ,I pull requests:https://github.com/roytseng-tw/Detectron.pytorch/pull/48

hewumars commented 6 years ago

I will carefully check the code

hewumars commented 6 years ago

@Rizhiy could you share PSRoIPooling ? I compare the code with https://github.com/msracver/Deformable-ConvNets/blob/master/rfcn/operator_cxx/psroi_pooling.cu,the different as shown:

Rizhiy commented 6 years ago

@hewumars I haven't yet got PSRoIPooling to work in PyTorch either.

YanShuo1992 commented 6 years ago

@Rizhiy How is the PSROI pooling going? I have seen you in many different repos. I think we both focus on the light-head rcnn, right? I don't get the PSRoIpooling in Pytorch either. I think it could be easier to use the code from the official tf implementation.

Rizhiy commented 6 years ago

@YanShuo1992 I'm currently using roytseng-tw/Detectron.pytorch, so far I have focused on getting the best mAP, so didn't put much work in light-head. I will try to let you know if I get something working.

YanShuo1992 commented 6 years ago

@hewumars @Rizhiy I checked @hewumars 's light head rcnn code. I might find something wrong. I use the PSROIpooling after the res5 or stage5 in resnet50, right? But the RPN is still after the stage4. What do you think？

Rizhiy commented 6 years ago

That's not entirely correct. You need to pass output of res5, through a layer which has k*k*n filters, where k is pooling size and n is arbitrary number of layers (10 in the paper). Then you apply psroipool on that.

I suggest you check https://github.com/msracver/Deformable-ConvNets/blob/f4e163719c8e63cfad7af1caaaab93d373750393/rfcn/symbols/resnet_v1_101_rfcn.py#L785-L798 for reference.

YanShuo1992 commented 6 years ago

@Rizhiy I will check the official rfcn to see how the rpn and large conv orignized. @roytseng-tw I am trying to implement the light rcnn based on your code. I tried a code from @hewumars and I get RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generic/THCStorage.cu:58

So that I check the .cu code of psroipooling. I find you commit that do not use rounding in the roialign_kernel.cu. Can you tell me the reason for that or what problem it will lead?

GYxiaOH commented 6 years ago

@YanShuo1992 are you meet out of memory after some iterations? i meet same question , i compare psroi code with caffe2 and can't find some things.but i barely use CUDA coding so...... do you solve the problem?

YanShuo1992 commented 6 years ago

@GYxiaOH Yes. I meet the out of memory when using psroi. I also check the caffe2 code or the tensorflow code and I find nothing. For now, I just give up the psroi and use alignroi.