Question regarding the randomized smoothing technique

Hi @lushleaf ,

Thanks for releasing the code.

I have a question regarding the randomized smoothing technique. In computer vision, a common approach of randomized smoothing is to train multiple instances of the model with different random seeds and noise distributions. During inference, predictions from these smoothed models are aggregated in some way, such as taking the majority vote or averaging the probabilities.

While in SAFER, is it reasonable that only the top layer weights of the BERT model are finetuned using the data augmentation instead of training the number of N models and why not training multiple models? Feel free to correct me and thank you.

lushleaf / Structure-free-certified-NLP

Question regarding the randomized smoothing technique #7