imatge-upc / detection-2016-nipsws

Hierarchical Object Detection with Deep Reinforcement Learning
http://imatge-upc.github.io/detection-2016-nipsws/
MIT License
423 stars 129 forks source link

Pre-trained model always select top-left region #2

Closed zhangxiangnick closed 7 years ago

zhangxiangnick commented 7 years ago

I used the pre-trained image-zooms model and tested on VOC2007 test. It runs properly, however, it seems that the agent just always take the same action ...

example: 000216

I used the default config in image_zooms_testing.py:

class_object = 1
# 1 if you want to obtain visualizations of the search for objects
bool_draw = 1
# Scale of subregion for the hierarchical regions (to deal with 2/4, 3/4)
scale_subregion = float(3)/4
scale_mask = float(1)/(scale_subregion*4)
# Number of steps that the agent does at each image
number_of_steps = 10
# Only search first object
only_first_object = 1
miriambellver commented 7 years ago

I see in the visualization that the q-value you are obtaining for each action is 'nan', and for this reason the action is always the same, the first one. The question is why you are obtaining nan... I can show you an example of visualization, you should see the estimated q-values related to each action, the final action selected, the mask of the region and the image region:

example_000243

zhangxiangnick commented 7 years ago

I see. You are right, this step gives nan

qval = model.predict(state.T, batch_size=1)

I found np.sum(state) is nan for me, state is the default beginning state. Then I realized in the get_state() function,

descriptor_image = get_conv_image_descriptor_for_image(image, model_vgg)

gives nan at some elements.

miriambellver commented 7 years ago

maybe you should check if you are loading the vgg16 weights properly, I downloaded them from the following source: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3

zhangxiangnick commented 7 years ago

Thanks @miriambellver ! After some investigations, I realized the problem is in this Theano function in get_conv_image_descriptor_for_image()

_convout1_f = K.function(inputs, [model.layers[31].output])

which produces nan.

To make it work, I actually switched to VGG16 from keras.applications and select the cooresponding Pool5 layer. Then the Theano function won't produce nan. This would require some changes of the requirement.txt as well.

If other people have the same issuees, I can submit all the details.