How to specify the inference batch size?

Hey, I'm doing predictions with a model I didn't train myself so my experience with the fast-bert library is limited. I've got all the model files and use them to instantiate a BertClassificationPredictor.

Let's say I now have a list with 2k texts that I want to predict (on CPU). I can't predict them all at once because I would run out of memory. I am using the predict_batch method on the predictor object simply passing all of the texts. I know that, internally, it's generating a data loader for this batch and is therefore breaking it down into multiple sub-batches that are being predicted one by one.

My question is: How I can I define the batch size of these sub-batches ? I've looked through the code and predictor.learner.data.batch_size_per_gpu seems to be the relevant parameter even though I'm predicting on CPU instead of GPU. So I am setting this parameter manually before calling predict_batch and it seems to somewhat work. It just feels kinda hacky.

I was wondering if this is the right way to go about it or if there is a better way to set the prediction batch size.

Thanks in advance!

utterworks / fast-bert

How to specify the inference batch size? #249