net.forward() crash, solver.step(1) does not, ideas?

miquelmarti commented 7 years ago

I created a Python data layer to feed a multi-task network with the label for each task, one task is Object Detection for which I use SSD. The data layer works as expected but for multiple reasons it does not resize the images so I resize the input of the network to adapt for every image size.

My problem is quite curious, when running caffe train or through pycaffe solver.step(1) I have no problem at all and the network even trains. However, when I do net.forward() I get an error for some images:

F0306 19:21:01.625699  3418 bbox_util.cu:590] Check failed: match_index.size() == num_preds_per_class (15009 vs. 13392)

I even get the error when running it on CPU mode, which is even more confusing as it refers to some CUDA code.

I lost a couple days trying to find out what was going on, if someone has any idea I'll really appreciate.

weiliu89 commented 7 years ago

Did you go to bbox_util.cu to check the exact problem.

miquelmarti commented 7 years ago

I did, it happens in the call to ComputeConfLossGPU (https://github.com/weiliu89/caffe/blob/ssd/src/caffe/util/bbox_util.cu#L590) but I haven't been able to identify where is this being called from. I don't see any direct call in multibox_loss_layer, where should I look?

miquelmarti commented 7 years ago

Sorry to bother you again @weiliu89, quick question: Is there any assumption that the input size is fixed between successive input images when computing the number of priors?

weiliu89 commented 7 years ago

The priorbox layer will use previous boxes if the input doesn't change. Check that layer for more details.

weiliu89 / caffe

net.forward() crash, solver.step(1) does not, ideas? #480