Results slightly different from paper

jind11 / TextFooler

A Model for Natural Language Attack on Text Classification and Inference

MIT License

485 stars 79 forks source link

Results slightly different from paper #1

Closed uvafan closed 4 years ago

uvafan commented 4 years ago

Hi, we're trying to reproduce the results from your paper. Running your code on the 1000 Yelp data samples in the repo gave results:
original accuracy: 97.000%, adv accuracy: 6.600%, avg changed rate: 13.879%, num of queries: 827.1 The results are slightly different from those reported in your paper. We thought the issue might be the counterfitted embeddings but we tried all 3 versions in https://github.com/nmrksic/counter-fitting/tree/master/word_vectors. Any idea what we are missing?

jind11 commented 4 years ago

hi, could you clarify which target model are you using? Thanks!

uvafan commented 4 years ago

We are using BeRT: python attack_classification.py --dataset_path data/yelp --target_model bert --max_seq_length 256 --batch_size 32 --counter_fitting_embeddings_path word_embeddings/counter-fitted-vectors.txt --target_model_path yelp_model --USE_cache_path '' --counter_fitting_cos_sim_path cos_sim_counter_fitting.npy

jind11 commented 4 years ago

hi, I am sorry for the late response. Are you using the same trained bert model as mine which is also provided in the link? I noticed that the original accuracy before attacking in your experiments is 97%, which is higher than what was reported in the paper. This may be reason of the discrepancy.

uvafan commented 4 years ago

Yes, we are using the same trained bert model downloaded from the link in your README. In fact from more experimentation we found that we can reproduce your exact results on the IMDB dataset including original accuracy, but we get different results on both Yelp and MR.

jind11 commented 4 years ago

Hi, I have checked the results again at my side and I think you are right, there is slight discrepancy between the new results and the paper. I think the discrepancy should be stemming from the situation where the new original accuracy before attack is higher than the reports. I will find time to correct the paper. Thanks for finding this out! If you have more results that are different from the paper, please let me know. Thanks again!