Closed shamangary closed 8 years ago
@shamangary I suggest to add a Dropout layer (p=0.1) after L2 Normalization. Referring to this, it will help for generalization of model.
Don't use homogeneous option as true in your code. (default is false)
nn.CompactBilinearPooling(dim)
will be fine. Homogeneous option enforces to sample the same h and s for two modality, which is for test-code purpose.
When you test the trained model, CompactBilinearPooling's randomly sampled parameters, h1, h2, s1, and s2 are should be fixed. Make sure this, not fixed cbp layer surely produce a poor result.
I also get a very poor result when CBP is added to a simple network.
Can you please explain what you mean by this:
When you test the trained model, CompactBilinearPooling's randomly sampled parameters, h1, h2, s1, and s2 are should be fixed. Make sure this, not fixed cbp layer surely produce a poor result.
@ilija139 In Fukui et al. (2016), they said
The vectors h and s are initialized randomly from uniform distribution, but remain fixed.
in Section 3.1.
Compact Bilinear Pooling method relies on these sampled parameters. Gao et al. (2016) mentioned that they use a property E[<Ψ(x,h,s),Ψ(y,h,s)>] = <x,y>
, which means projected vectors by Ψ have the same value with dot product in expectation w.r.t. h and s. However, getting the exact expectation w.r.t. these random parameters is intractable, and when we see Ψ(y,h,s) as learned parameters in the last full-connected layer, Ψ(x,h,s) as an output projected vector in testing, both projections should use the same h and s (in this property, h and s should be the same, if not, not equal). In a practical choice, it seems that they used sampled ones (having a big output dimension, e.g. 16k, to reduce a bias from it), and use it again when doing inference.
So, when you build a model using CompactBilinearPooling, random parameters are sampled when initialized. When doing inference (testing), those random parameters should have the same values.
If you save the whole model using torch.save('model.t7', model)
, h and s are saved along with. However, when you save only parameters using getParameters()
and repopulate the parameters, h and s are resampled when a new model is initialized. In this case, we should set h and s manually to be fixed as training.
@jnhwkim Oh, I see. Thanks for the detailed explanation.
Will h
and s
be saved and the training ok, if we train the model and after let say 10K iterations we call model:evaluate()
, do the evaluation of the model, and then call model:training()
before proceeding the training?
This is how I evaluate the model and in this way, the model essentially does not learn.
@jnhwkim Thank you for your throughly response. @ilija139 I am also going to ask the same question.
However, I am pretty sure I save the model in training by torch.save('model.t7', model). Therefore there should be no different h1,h2,s1,s2 for testing right?
I am wondering maybe due to the triplet training, three CBPs may have different start point? I will check this soon. Thank you ;)
OK. I have check the issue. It seems that the initial of triplet are the same. However, I did not share h1,h2,s1,s2 between the triplet. After I link them, the performance is reasonable now. Thank you for your time. Great work ;)
I'm only using one CBP and still get low performance, but it may be CBP is not compatible with my network model. Although, I only swapped element-wise multiplication of two LSTM states with CBP. I'll check h
and s
tomorrow and see how they behave.
@shamangary glad to hear that. Keep in mind that pooling size should be large enough, e.g. 16k. at least 8k. I heard this emphasis via peasonal communications with the authors. See also the paper. @ilija139 I did only check this cbp code with my own model switching with CBP layer and I got just a reasonable performance with some degradation. I did not apply the above tuning facts before Fukui released the mcb code. So, if I make some time, I plan to reproduce VQA-MCB in torch7.
Hi. This work CBP is very good at recognition and VQA on the papers. The idea of feature learning by two-stream and merge them into one should be a general method. However, I test CBP on local feature matching and the performance is very poor. Have you test the method on other problem and the performance is not good? Or is there some special criterion before I insert CBP into the network? (batch size? learning rate?)
To be more precise, I use the following network at the end of my network
local temp_m = nn.ConcatTable()
local A_net = nn.Sequential() ...... A_net:add(nn.Linear(4096,512))
local B_net = nn.Sequential() ...... B_net:add(nn.Linear(4096,512))
temp_m:add(A_net) temp_m:add(B_net) model:add(temp_m) model:add(nn.CompactBilinearPooling(dim,true)) model:add(nn.SignedSquareRoot()) model:add(nn.Normalize(2))
Thank you for your reply.