jnhwkim / MulLowBiVQA

Hadamard Product for Low-rank Bilinear Pooling
Other
71 stars 18 forks source link

decay_factor is wrong #3

Closed Dualmirror closed 7 years ago

Dualmirror commented 7 years ago

decay_factor should be 0.99999040594147(not 0.99997592083) if opt.iterPerEpoch = 240000 / opt.batch_size ,and opt.batch_size = 100. in the paper, batch_size is 200 In fact, opt.iterPerEpoch should be 334554/ opt.batch_size ,so the kick_interval must be changed too

jnhwkim commented 7 years ago

Sorry for the confusing. I'll update the appendix (refer to this update).

The batch size of 100 is what I trained due to gpu memory constraint. (the batch size of 200 was for evaluation, which is meaningless for training.)

The decay_factor=0.99997592083 is borrowed from this. The commented equation math.exp(math.log(0.1)/opt.learning_rate_decay_every/opt.iterPerEpoch) is from here, but it does not help for our case. What I experience is that the optimization is highly tricky. Maybe 0.99997592083 is golden with RMSProp (or not). I leave this for further investigation. (I should discuss with @jiasenlu 🤔)

A training option kick_interval is inspired by deepsense.io. For VQA, this option is borrowed from our previous work, after an empirical observation of minor improvement. I am not sure the adaptive change of kick_interval is helpful or not.

Dualmirror commented 7 years ago

Thank you for answering

jnhwkim commented 7 years ago

@Dualmirror you're welcome!