zhu-xlab / SSL4EO-S12

SSL4EO-S12: a large-scale dataset for self-supervised learning in Earth observation
Apache License 2.0
179 stars 17 forks source link

Normalization for S1? #17

Closed bengmstrong closed 10 months ago

bengmstrong commented 10 months ago

I'd like to use one of your pre-trained Sentinel-1 models, but I don't see how you normalized your Sentinel-1 data. In the readme, I see the normalization scheme for Sentinel-2 ("input clip to [0,1] by dividing 10000"), but can you add instructions for how to pre-process Sentinel-1?

wangyi111 commented 10 months ago

Hi! Yes we used dB scale images, removed extreme values, and normalized mean and standard deviation for each channel.

bengmstrong commented 10 months ago

great thanks! So just to clarify: For S1 data, you removed extreme values and then took Mean / Std and then normalized like $\bar{X} = \frac{X - \mu}{\sigma}$

which gives a normalized range of values that is both positive and negative, with mean equal to zero. Or did you use the normalization function that's in the code here, which results in values clipped to 0-255?

For S2 data, you simply divided by 10000, resulting in values in the range [0,1]

Is that right?

wangyi111 commented 10 months ago

Hi! Yes for the pretraining we used the normalization in the code (values clipped to 0-255), the input to the network after dataloader is squeezed to 0-1.

For S2, we simply divided by 10k in pretraining.

Note that in downstream tasks, both "0-1" and "0 centered" normalization should work. In our experience, using exactly the same as the pretraining not necessarily always leads to the best performance.

bengmstrong commented 10 months ago

Sounds good! Thanks for the clarification.