Closed rishibommasani closed 2 years ago
i. Those two seem good to me. ii. Yes, this is what we discussed, but I didn't realize that the CausalLM scoring method is way better for HellaSwag (like a 50 point difference!). I think we should move the CLM code into the adapter (which we discussed before), and use that for any LM-like multiple-choice task, which would be useful for BLIMP as well. iii. Let's rename to "commonsense" then.
For ii, tracking in #550
Closing, with remainder handled in #550
@michiyasunaga could you look at this sometime tomorrow (Wednesday) since we need to solidify before the run.