javiribera / locating-objects-without-bboxes

PyTorch code for "Locating objects without bounding boxes" - Loss function and trained models
Other
249 stars 52 forks source link

Different size of test images #16

Open Farah189 opened 4 years ago

Farah189 commented 4 years ago

Hi, The trained network detects objects only in those images which have the same dimensions as of the training images. Can it be modified to detect objects in test images with different dimensions?

javiribera commented 4 years ago

This problem occurs because the linear layers are created during training depending on the size of the input training images (see Figure 3 of the paper). If you then input testing images of a different size, the size of the features maps before the linear layers won't match the number of neurons of the linear layers. I guess you could easily fix this by replacing the linear layers with global average pooling layers.

Farah189 commented 4 years ago

Yes, it gives the error of different sizes of expected feature maps before the linear layer. I will try using the suggested pooling layer. I will close this issue in few days after trying this layer. Thanks

javiribera commented 4 years ago

Could you post how you fixed this issue? The code may be helpful for other people. Thank you,

Farah189 commented 4 years ago

The use of a global average pooling layer has eliminated the error of mismatched size that it gives at the FC layer. But it is not estimating the number of objects correctly which will, in turn, affect the GMM. I have to work around a little bit more to figure it out.

Farah189 commented 4 years ago

Hi, I have tried GAP on the concatenated innermost and last layer. I have also tried it on only the last layer. But every time it gives the estimated count value to be less than 1. I have also tried using GAP+ReLU instead of FC+SoftPlus but it still gives the same problem. So, the use of GAP partially solves this problem. Or maybe I am not using it correctly. This is just an update about this problem.

javiribera commented 4 years ago

None of the options you said you tried restricts the activation values to be below 1, so this sounds like a software bug. Please post your unet_model.py file so that we can know exactly what layers you modified.

Farah189 commented 4 years ago

GAP actually takes the average of the whole layer. So, the average of the last 256x256 probability map will definitely be below 1 as there will be only few activation points (as per my understanding). Multiplying it with the dimension of the image will still not solve the problem, because the estimate will be way larger than it should be. I am still using the network that you presented in your first paper of UNet. You can adjust the dimension of the GAP according to the dimension of your images.

Farah189 commented 4 years ago

Kindly let me know if you are checking it. I am just a bit curious if I was right or wrong about the problem that occurs with the use of GAP layer.

javiribera commented 4 years ago

So your current problem is that with the unet_model.py you posted above you always get the estimated count value to be less than 1? Also I'm going to need you to include all the information requested here: https://github.com/javiribera/locating-objects-without-bboxes#creating-an-issue

Farah189 commented 4 years ago

I do not want you to reproduce the same thing as it seems to be a wastage of time. I just need your opinion if I am right about this thing: "GAP actually takes the average of the whole layer. So, the average of the last 256x256 probability map will definitely be below 1 as there will be only few activation points (as per my understanding)." I studied it before using. So, correct me if my understanding is wrong as you are more experienced.

javiribera commented 4 years ago

I think your intuition is correct. In fact, I remember I once tried something similar and experimented with a GAP layer connected to the probablility map. The difference is that I also multiplied the result with a constant, which can be trained. I did not get better results than in my paper but you can further experiment with this since your problem is different (mismatching tensor sizes).

Also note that in the unet_model.py you attached above, your GAP is applied to the concatenation of both the probability map and the activations of the lateral network. It is hard to get an intuition for this case.

Farah189 commented 4 years ago

I have tried multiplying with a constant too. And I have also tried using only the probability layer. But then I read the GAP in detail to understand it and get to know that it will not solve the problem. I will keep on trying to solve this issue. Thanks a lot for your guidance. It helped me understand more things about network structure.

danielyyt commented 4 years ago

does the code supports now different size of the test images?

Farah189 commented 4 years ago

does the code supports now different size of the test images?

I have not checked it again. But I think that most of the fully connected networks (U-Net in this case), have this limitation to support the same size for training and testing images.

javiribera commented 4 years ago

does the code supports now different size of the test images?

No