nyu-mll / BBQ

Repository for the Bias Benchmark for QA dataset.
Creative Commons Attribution 4.0 International
84 stars 22 forks source link

BBQ RoBERTa Base Reproducibility Help #3

Open gsgoncalves opened 1 year ago

gsgoncalves commented 1 year ago

Hello,

Congratulations on this great work!

I am reaching out for pointers as I am unable to reproduce the accuracy results from the paper while using RoBERTa-Base.

I finetuned the RoBERTa-Base model on the RACE dataset, with the LRQA codebase. Next, I followed the instructions in the previous link to evaluate on BBQ. However, I obtained a 51.64%  average accuracy across categories, which is shy of the 61.4% reported in the paper.

I used the same parameters reported in the paper:

I am using the libraries and respective versions in the requirements.txt file.

Do you have any clues as to why I am not able to obtain the same results in terms of accuracy while running the instructions of LRQA? Any pointers would be much appreciated!

Thank you! Gustavo

zphang commented 1 year ago

Hi, let me take a look into this.