google / tirg

deep learning, image retrieval, vision and language
Apache License 2.0
296 stars 85 forks source link

Why normalize_scale is set to 4.0? #13

Closed invisprints closed 3 years ago

invisprints commented 3 years ago

I found in the https://github.com/google/tirg/blob/c58daa066d8af1a5b3de1a0ef6d112519ddda611/img_text_composition_models.py#L42, the normalize_scale is set to 4 at the beginning. It is interesting because the performance will drop a lot if I set the scale to other number like 1.0. I wonder why it is important and why 4 is an ideal number. Thanks!

lugiavn commented 3 years ago

great question, this number affects the range of the logit value (input to the logistic function https://en.wikipedia.org/wiki/Logistic_function), if you look at the logistic function, the logit value range must be big enough to cover the whole 0-1 output range (but if it's too big it could land into the flat gradient zone)

I just found experimentally 4.0 works, you can make it a learnable weight, though optimizing that could be more difficult. Finally I've seen similar implementation where people did make it work with 1.0 I'm not sure how

invisprints commented 3 years ago

Thank you for your reply. I wanted to replace it with torch.nn.functional.normalize at first, but the result shows it is not as simple as I thought.