Open shlomihod opened 1 year ago
@shlomihod You're right; this looks like a bug. We should fix this for future releases. But I also need to think about reverse compatibility (i.e. unfortunately we can't evaluate the fixed perturbation for old models that have been deprecated).
@shlomihod Thanks again for catching this subtle bug!
cc @dilarasoylu regarding GenderPerturbation
bug that we should probably fix
On related topic: I think there is a principled issue regarding the perturbation: depends on the text, they might have no effect.
Dialect and robustness would probably transform many words, but gender might have no impact on a text (e.g., reviews written in first person). I think there should be another metric to assess the proportions of datapoints that were changed, something like manipulation check for experiment design. Only if the manipulation is substantial (not sure what is the right way, but here is a simple heuristic for start: > 50% of the examples changed), then we are allowed to interpret the effect of the metric.
Is there a reason why GenderPerturbation apply the perturbation only on words surrounded by non-alphanumeric characters, and not for example, at the beginning of the sentence?
This happens because of the regex: https://github.com/stanford-crfm/helm/blob/daa165aae07e575bdcb3b5cca699403c82c759dc/src/helm/benchmark/augmentations/gender_perturbation.py#L195-L198
Perhaps a better option would be using the regex word boundary
\b
? Something like that:pattern = fr"\b({re.escape(word)})\b"