Ask for datasets help - Githubissues

yqw0710 commented 1 year ago

Hi,I read your article and found that the experimental results were very effective. I learned that your training datasets came from MNLI and SNLI, but I didn't find the specific preprocessing steps. Could you please provide the preprocessed code?Thank you very much!

aditya-anulekh commented 1 year ago

Hey! Did you manage to find the preprocessing steps for generating the dataset? If so, could you please point me to them? Thanks!

jacqueline-he commented 1 year ago

The dataset (as linked in the README) can be found here.

I have since graduated and no longer have access to the original preprocessing script, but I'd imagine that recreating it should be pretty simple. You can download the SNLI or MNLI dataset from HuggingFace, and filter for the entailment pairs (so the premise and hypothesis would be orig_sent0 and orig_sent1, respectively). Then, apply CDA using the 10 or so gender word pairs listed in Appendix A of the paper - the gender-flipped premise and hypothesis would then be aug_sent0 and aug_sent1, respectively. Finally, the column for both is 1 if both the premise and the hypothesis have gendered words that are flipped, and 0 otherwise (this affects the computation of one of the losses).

Let me know if you have any other questions.

princeton-nlp / MABEL

Ask for datasets help #2