tigerchen52 / LOVE

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
MIT License
39 stars 7 forks source link

[Table 4 SST2 Task] BERT+LOVE Reproduction Issue #10

Closed MingyuKim-2933 closed 11 months ago

MingyuKim-2933 commented 11 months ago

Hi, I've been trying to reproduce the performance in the paper for the SST2 task using the 'BERT+LOVE' embedding you provided. I tried changing various hyper-parameters in the model and modifying the code. However, I failed to reproduce the performance of the paper.

My reproduction performance is below. image

Could you provide the code that performed the SST2 Task?

Thank you!

tigerchen52 commented 11 months ago

Hi,

Thanks for asking! A quick question, the repo offered two versions of LOVE and did you use the 768-dimensional model link?

MingyuKim-2933 commented 11 months ago

Yes, I did the SST2 task using the 768-dimensional model you provided.

tigerchen52 commented 11 months ago

I will do the reproduction, and it may take some time

MingyuKim-2933 commented 11 months ago

I appreciate your big support!!

tigerchen52 commented 11 months ago

Hi,

I added some files for the reproduction of Table 4. Have a look at hereLOVE/extrinsic/bert_text_classification. The data directory has all the data used in this experiment, including the samples with typos. data/vocab.txt contains all words in this experiment, including typos. You need to download the prepared word embeddings generated by LOVE (love.emb), and put it to the data.

To reproduce scores for the original BERT:

python bert_plus_love.py --use_love False

To reproduce scores for BERT + LOVE:

python bert_plus_love.py --use_love True

We train the model by using five different learning rates, and record results of corresponding testing sets.

This is the average acc of five runs: Model / typo rate 0 10 20 30 40 50 60 70 80 90
BERT 91.3 90.4 87.3 84.3 81.5 77.0 73.7 69.8 64.8 58.7
BERT+LOVE 91.0 89.6 87.4 85.1 83.0 79.4 75.5 71.6 68.0 61.3
This is the max acc among five runs: Model / typo rate 0 10 20 30 40 50 60 70 80 90
BERT 91.6 90.9 87.8 85.0 82.0 77.5 74.6 70.6 66.0 59.3
BERT+LOVE 92.1 90.5 87.7 86.1 84.0 80.7 76.9 73.2 70.7 63.1

We can observe that adding LOVE can make BERT robust. The scores might be slightly different from the reported scores in our paper due to the following reasons:

  1. I left my university and the relevant code is missing. I have to rewrite all the code from the beginning based on my memory
  2. there is randomness when adding typos.

You can first run the code on datasets I provided to reproduce the score. If it works, then you can run it again on your constructed dataset.

Details of BERT results = [[0.91044887039239, 0.9023669738406659, 0.8746655766944115, 0.8439543697978598, 0.8151753864447087, 0.7754161712247325, 0.745671076099881, 0.6986846016646848, 0.6482424197384067, 0.5930625743162901], [0.9097986028537455, 0.9006391200951248, 0.8775824910820451, 0.8497696195005946, 0.8200245243757431, 0.769266498216409, 0.7405060939357908, 0.7054659631391201, 0.6593527051129607, 0.5932669441141498], [0.916375594530321, 0.9023669738406659, 0.8710054994054697, 0.8481532401902497, 0.8114038347205708, 0.7716446195005946, 0.7386667657550535, 0.7058003864447087, 0.6533145065398335, 0.5892910225921522], [0.9136816290130796, 0.9077549048751486, 0.8742382580261593, 0.8373773781212842, 0.8163644470868014, 0.7677615933412604, 0.733817627824019, 0.6937239892984541, 0.6403834720570749, 0.5845533590963139], [0.9147592152199762, 0.9088324910820451, 0.86777274078478, 0.8336058263971463, 0.8114038347205708, 0.7661452140309156, 0.7273521105826397, 0.6872584720570749, 0.6382282996432818, 0.5764714625445898]]

Details of BERT + LOVE results = [[0.9213362068965517, 0.9050609393579072, 0.8692776456599287, 0.8562351367419738, 0.8256354042806183, 0.7909296967895363, 0.735118162901308, 0.7092560939357908, 0.6522369203329369, 0.5920964625445898], [0.9087210166468489, 0.8947123959571938, 0.8731606718192627, 0.8607684304399524, 0.8391052318668253, 0.8035448870392391, 0.7661452140309156, 0.7256242568370986, 0.6916802913198573, 0.6305737217598097], [0.9109876634958383, 0.8969790428061831, 0.8743497324613555, 0.8455707491082045, 0.8230529131985731, 0.7829592746730083, 0.7528983353151011, 0.7090331450653984, 0.6736771700356718, 0.6126820749108205], [0.9005276456599287, 0.8883583531510106, 0.8742382580261593, 0.8439543697978598, 0.8205633174791914, 0.7866193519619501, 0.7506316884661118, 0.7023446789536266, 0.6736771700356718, 0.6034111177170035], [0.9065658442330559, 0.8969790428061831, 0.8774710166468489, 0.8493423008323425, 0.8404057669441142, 0.8065546967895363, 0.768839179548157, 0.7319782996432818, 0.706766498216409, 0.6283070749108205]]
MingyuKim-2933 commented 11 months ago

Thanks for your help, really appreciate :)