Right now the model will try to run generate on the entire input list at once, which could result in the device running out of memory. to avoid that, we could allow a batch_size (or chunk_size) parameter to the translate function.
By default it'd be None which would then attempt to run generate on everything. If an integer > 1 is given, it will create batch sizes of that size and iteratively generate the translations.
Right now the model will try to run
generate
on the entire input list at once, which could result in the device running out of memory. to avoid that, we could allow abatch_size
(orchunk_size
) parameter to thetranslate
function.By default it'd be
None
which would then attempt to rungenerate
on everything. If an integer > 1 is given, it will create batch sizes of that size and iteratively generate the translations.