jind11 / TextFooler

A Model for Natural Language Attack on Text Classification and Inference
MIT License
485 stars 79 forks source link

Quality of adversaries and authenticity of results #24

Open SachJbp opened 4 years ago

SachJbp commented 4 years ago

There seems to be a issue in a few adversaries.

For example: A claimed adversary from mr_bert.txt is: orig sent (0): to portray modern women the way director davis has done is just unthinkable adv sent (1): to portray modern women the way director davis has done is just imaginable

unthinkable and imaginable are antonyms which erroneously have high cosine similarity suggesting that those are synonyms. I suggest such examples should not be considered while evaluating the success rate of attack, as the human evaluation would clearly label it as positive (1) and not negative.

jind11 commented 4 years ago

Yes, the human evaluation on polarity is not 100% due to these errors.

SachJbp commented 4 years ago

The ~13% after-attack accuracy reported considers such examples as success , which actually is not. I guess Human evaluation filter should finally govern the after-attack accuracy. Please correct me if I am wrong. Thanks.

jind11 commented 4 years ago

Human evaluation can check whether these "successful" examples are legitimate or not.

Youoo1 commented 2 years ago

Where is the emdding.npz file, please? Or how is it generated? 3b8380b89d2686d6cb586f83719ca03 7a678cd5f2a8398b7980d8aaa9d5aec