Closed astariul closed 5 years ago
As I understood so far :
Each step, the Trainer
class run accum_count
mini-batch.
For each mini-batch, there is X samples, where X vary following :
X * max(len(x) for x in X) <= batch_size
Please let me know if I understood right !
I think some comments on this function :
might be helpful !
For extractive, batch_size is maximum number of sentences in the source document For abstractive, batch_size is maximum number of tokens in the target summary
It is designed to use the memory more effectively.
Then, what is the exactly number of real(original mean) batch size?
Batchsize (by its traditional definition) is not a fixed number here. This is designed to use the memory much more efficiently than using a fixed number.
@jdh3577 As I said, this is dynamic during training.
@nlpyang would decreasing the batch size to 512 in Extractive summarization affect performance?
Hi, Does the batchsize here have something to do with the number of GPU, it uses the distributed training, how does the model update its parameters? is it all gpus gradient merge and then update?
if self.grad_accum_count > 1: if self.n_gpu > 1: grads = [p.grad.data for p in self.model.parameters() if p.requires_grad and p.grad is not None] distributed.all_reduce_and_rescale_tensors( grads, float(1)) for o in self.optims: o.step() maybe the code here
@nlpyang could you please shed some light on the meaning of this parameter, it clearly isn't the number of documents in the batch, but something related to the number of word-pieces multiplied by a funny factor 300. Is the latter a typo or a magic number inserted on purpose ? Thanks
I'm having difficulties to wrap my head around the
batch_size
parameter.What exactly is the
batch_size
parameter ?It's not the real batch size (i.e. how many samples can be processed at once).
So what is it exactly ? And how can I choose the real batch size from this argument ?