microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

[beit mask generation] Why is the aspect ratio sampled from log uniform instead of uniform in the mask generation? #1422

Open ShengYun-Peng opened 10 months ago

ShengYun-Peng commented 10 months ago

https://github.com/microsoft/unilm/blob/78b3a48de27c388a0212cfee49fd6dc470c9ecb5/beit/masking_generator.py#L59

The aspect ratio $\in [0.3, 1/0.3]$. I'm curious what is the intuition behind sampling from log uniform instead of uniform? The range is not spanning multiple orders of magnitude.

addf400 commented 10 months ago

Thank you for your question, and it's indeed an excellent point you've brought up.

If we directly use a linear space (np.linspace) to sample from a 1:3 to 3:1 aspect ratio, it will result in an asymmetrical number of cases where the aspect ratio is greater than 1 and less than 1. (1/3 ~ 1: less than 1, 1 ~ 3: greater than 1) This is because the ratios are not linearly distributed in value, especially when the ratio flips (i.e., from a width-to-height ratio to a height-to-width ratio).

To achieve uniform sampling of aspect ratios, we can first perform uniform sampling in logarithmic space and then convert back to the original aspect ratios. This approach allows us to maintain a balance between cases where the aspect ratio is greater than 1 and those less than 1.

ShengYun-Peng commented 10 months ago

Thanks, @addf400! That clarifies my question.