Is the data pre-processing correct?

quwenjie / MultiGuard

Official code for MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples(Neurips2022)

2 stars 1 forks source link

Is the data pre-processing correct? #1

Closed GuanlinLee closed 1 year ago

GuanlinLee commented 1 year ago

I notice that you use the data, which are normalized into [-1, 1]. However, when you train the models, the data are clipped into [0,1] in Line 99 of train.py. Is this a mistake?

quwenjie commented 1 year ago

The data is normalized by MEAN=0, STD=1 here.

GuanlinLee commented 1 year ago

Yes, the data before feeding into the model are normalized into [-1,1]. But you actually feed data in [0,1] into the model, after adding the noise. So, half of the values in color channels are masked.

quwenjie commented 1 year ago

Sorry I don't see where the data is normalized into [-1,1]. I think that after reading from the file, the range is [0,1], then it is normalized by (x-MEAN)/STD, where MEAN=0, STD=1, therefore the range is still [0,1].

GuanlinLee commented 1 year ago

Sorry, my bad. I wonder that as the Gaussian noise sampled is (-\infty, \infty), whether it is OK to directly add it to images in [0,1].

quwenjie commented 1 year ago

In [1], when comparing on SVHN, they let the pixel's value lie in [0,1]. Though gaussian noise is sampled from large range, with very high probability the value is not big. The effect of normalizing the image should be considered in the calculation of certified radius, after you normalize, the 'scale' of the certified radius you get has changed. Therefore we don't normalize the image here for simplicity. [1]Certified Adversarial Robustness via Randomized Smoothing

GuanlinLee commented 1 year ago

Thanks for your explanation. It is clear now.