Closed dorost1234 closed 2 years ago
Hi @dorost1234, this only affects evaluation, training should work with larger batch sizes. While it would absolutely be possible to implement evaluation batchwise, the current implementation of _get_choice_log_probability
requires a batch size of 1 as this makes the decoding strategy more straightforward to implement (we don't have to deal with different examples having a different number of mask tokens).
Hi I am looking into the task helper for CoPA, and to me you could potentially do it batchwise, I was wondering if there was any reason to enforce batch size of 1 for multiple token verbalizers?
thanks for your help on this