Robustness to typographic attacks?

Hello,

Thanks so much for releasing your great paper, I enjoyed reading it and kudos on the utility analysis in section 7, it presents a clean picture of the original motivation of the paper. Since the models trained with T-MARS (and other intersecting baselines) lead to more "visually salient" representations due to the text masking process, it seems likely that this could be an effective strategy to mitigate typographic attacks---since the models trained on this filtered dataset "places less weight" on learning to do OCR, perhaps it abstracts away / learns to ignore text imposed onto images? I am curious to know if you have done any such robustness checks on typographic attacks, or any inference time analysis between the original models and the models trained on filtered data on samples containing text in the images?

locuslab / T-MARS

Robustness to typographic attacks? #1