jnhwkim / cbp

Multimodal Compact Bilinear Pooling for Torch7
Other
68 stars 23 forks source link

Have you test cbp on other problem? #6

Closed shamangary closed 8 years ago

shamangary commented 8 years ago

Hi. This work CBP is very good at recognition and VQA on the papers. The idea of feature learning by two-stream and merge them into one should be a general method. However, I test CBP on local feature matching and the performance is very poor. Have you test the method on other problem and the performance is not good? Or is there some special criterion before I insert CBP into the network? (batch size? learning rate?)

To be more precise, I use the following network at the end of my network


local temp_m = nn.ConcatTable()

local A_net = nn.Sequential() ...... A_net:add(nn.Linear(4096,512))

local B_net = nn.Sequential() ...... B_net:add(nn.Linear(4096,512))

temp_m:add(A_net) temp_m:add(B_net) model:add(temp_m) model:add(nn.CompactBilinearPooling(dim,true)) model:add(nn.SignedSquareRoot()) model:add(nn.Normalize(2))


Thank you for your reply.

jnhwkim commented 8 years ago

@shamangary I suggest to add a Dropout layer (p=0.1) after L2 Normalization. Referring to this, it will help for generalization of model.

Don't use homogeneous option as true in your code. (default is false)

nn.CompactBilinearPooling(dim)

will be fine. Homogeneous option enforces to sample the same h and s for two modality, which is for test-code purpose.

When you test the trained model, CompactBilinearPooling's randomly sampled parameters, h1, h2, s1, and s2 are should be fixed. Make sure this, not fixed cbp layer surely produce a poor result.

ili3p commented 8 years ago

I also get a very poor result when CBP is added to a simple network.

Can you please explain what you mean by this:

When you test the trained model, CompactBilinearPooling's randomly sampled parameters, h1, h2, s1, and s2 are should be fixed. Make sure this, not fixed cbp layer surely produce a poor result.

jnhwkim commented 8 years ago

@ilija139 In Fukui et al. (2016), they said

The vectors h and s are initialized randomly from uniform distribution, but remain fixed.

in Section 3.1.

Compact Bilinear Pooling method relies on these sampled parameters. Gao et al. (2016) mentioned that they use a property E[<Ψ(x,h,s),Ψ(y,h,s)>] = <x,y>, which means projected vectors by Ψ have the same value with dot product in expectation w.r.t. h and s. However, getting the exact expectation w.r.t. these random parameters is intractable, and when we see Ψ(y,h,s) as learned parameters in the last full-connected layer, Ψ(x,h,s) as an output projected vector in testing, both projections should use the same h and s (in this property, h and s should be the same, if not, not equal). In a practical choice, it seems that they used sampled ones (having a big output dimension, e.g. 16k, to reduce a bias from it), and use it again when doing inference.

So, when you build a model using CompactBilinearPooling, random parameters are sampled when initialized. When doing inference (testing), those random parameters should have the same values.

If you save the whole model using torch.save('model.t7', model), h and s are saved along with. However, when you save only parameters using getParameters() and repopulate the parameters, h and s are resampled when a new model is initialized. In this case, we should set h and s manually to be fixed as training.

ili3p commented 8 years ago

@jnhwkim Oh, I see. Thanks for the detailed explanation.

Will h and s be saved and the training ok, if we train the model and after let say 10K iterations we call model:evaluate(), do the evaluation of the model, and then call model:training() before proceeding the training?

This is how I evaluate the model and in this way, the model essentially does not learn.

shamangary commented 8 years ago

@jnhwkim Thank you for your throughly response. @ilija139 I am also going to ask the same question.

However, I am pretty sure I save the model in training by torch.save('model.t7', model). Therefore there should be no different h1,h2,s1,s2 for testing right?

I am wondering maybe due to the triplet training, three CBPs may have different start point? I will check this soon. Thank you ;)

shamangary commented 8 years ago

OK. I have check the issue. It seems that the initial of triplet are the same. However, I did not share h1,h2,s1,s2 between the triplet. After I link them, the performance is reasonable now. Thank you for your time. Great work ;)

ili3p commented 8 years ago

I'm only using one CBP and still get low performance, but it may be CBP is not compatible with my network model. Although, I only swapped element-wise multiplication of two LSTM states with CBP. I'll check h and s tomorrow and see how they behave.

jnhwkim commented 8 years ago

@shamangary glad to hear that. Keep in mind that pooling size should be large enough, e.g. 16k. at least 8k. I heard this emphasis via peasonal communications with the authors. See also the paper. @ilija139 I did only check this cbp code with my own model switching with CBP layer and I got just a reasonable performance with some degradation. I did not apply the above tuning facts before Fukui released the mcb code. So, if I make some time, I plan to reproduce VQA-MCB in torch7.