Open ShengYun-Peng opened 10 months ago
Thank you for your question, and it's indeed an excellent point you've brought up.
If we directly use a linear space (np.linspace) to sample from a 1:3 to 3:1 aspect ratio, it will result in an asymmetrical number of cases where the aspect ratio is greater than 1 and less than 1. (1/3 ~ 1: less than 1, 1 ~ 3: greater than 1) This is because the ratios are not linearly distributed in value, especially when the ratio flips (i.e., from a width-to-height ratio to a height-to-width ratio).
To achieve uniform sampling of aspect ratios, we can first perform uniform sampling in logarithmic space and then convert back to the original aspect ratios. This approach allows us to maintain a balance between cases where the aspect ratio is greater than 1 and those less than 1.
Thanks, @addf400! That clarifies my question.
https://github.com/microsoft/unilm/blob/78b3a48de27c388a0212cfee49fd6dc470c9ecb5/beit/masking_generator.py#L59
The aspect ratio $\in [0.3, 1/0.3]$. I'm curious what is the intuition behind sampling from log uniform instead of uniform? The range is not spanning multiple orders of magnitude.