Thank you for your excellent work! I have two questions:
The paper mentions "If we only minimize the log-likelihood of predicting a single ground-truth description, the model can also output other correct descriptions given the adversarial example, making the attack ineffective." Is there any experimental support for this insight?
Regarding the evaluation metrics, does this codebase provide code for calculating the attack success rate?
Thank you for your excellent work! I have two questions: