Open SachJbp opened 4 years ago
Yes, the human evaluation on polarity is not 100% due to these errors.
The ~13% after-attack accuracy reported considers such examples as success , which actually is not. I guess Human evaluation filter should finally govern the after-attack accuracy. Please correct me if I am wrong. Thanks.
Human evaluation can check whether these "successful" examples are legitimate or not.
Where is the emdding.npz file, please? Or how is it generated?
There seems to be a issue in a few adversaries.
For example: A claimed adversary from mr_bert.txt is: orig sent (0): to portray modern women the way director davis has done is just unthinkable adv sent (1): to portray modern women the way director davis has done is just imaginable
unthinkable and imaginable are antonyms which erroneously have high cosine similarity suggesting that those are synonyms. I suggest such examples should not be considered while evaluating the success rate of attack, as the human evaluation would clearly label it as positive (1) and not negative.