Closed invisprints closed 3 years ago
great question, this number affects the range of the logit value (input to the logistic function https://en.wikipedia.org/wiki/Logistic_function), if you look at the logistic function, the logit value range must be big enough to cover the whole 0-1 output range (but if it's too big it could land into the flat gradient zone)
I just found experimentally 4.0 works, you can make it a learnable weight, though optimizing that could be more difficult. Finally I've seen similar implementation where people did make it work with 1.0 I'm not sure how
Thank you for your reply. I wanted to replace it with torch.nn.functional.normalize
at first, but the result shows it is not as simple as I thought.
I found in the https://github.com/google/tirg/blob/c58daa066d8af1a5b3de1a0ef6d112519ddda611/img_text_composition_models.py#L42, the normalize_scale is set to 4 at the beginning. It is interesting because the performance will drop a lot if I set the scale to other number like 1.0. I wonder why it is important and why 4 is an ideal number. Thanks!