When training models, the bulk of evaluation is done on the main worker. When we train with 8 GPUs, we should get around an 8x speedup on eval, which would make a difference with large evaluation sets.
The main culprit is this method: https://github.com/jxmorris12/vec2text/blob/master/vec2text/trainers/base.py#L363C5-L365C27 and the subsequent call to _get_decoded_sequences in the Base trainer class. We explicitly enumerate over an eval dataloader of the first n samples which (I think) will happen once in each worker. Instead, we should split the work among multiple GPUs.
When training models, the bulk of evaluation is done on the main worker. When we train with 8 GPUs, we should get around an 8x speedup on eval, which would make a difference with large evaluation sets.
The main culprit is this method: https://github.com/jxmorris12/vec2text/blob/master/vec2text/trainers/base.py#L363C5-L365C27 and the subsequent call to _get_decoded_sequences in the Base trainer class. We explicitly enumerate over an eval dataloader of the first n samples which (I think) will happen once in each worker. Instead, we should split the work among multiple GPUs.