Batch size and optimizer

msr-fiddle / pipedream

MIT License

379 stars 117 forks source link

Batch size and optimizer #47

Closed nirandaperera closed 4 years ago

nirandaperera commented 4 years ago

Hi, I see that the profiler calculates memory and execution times based on a particular batch size. But the optimizer code does not take in any batch size parameter. So, does that mean, in the optimizer logic, the execution times and activation memories are normalized?

Best

deepakn94 commented 4 years ago

Why do they need to be? Right now, the optimizer is not trying to sweep batch size? It assumes that this is provided as an input

nirandaperera commented 4 years ago

So the optimizer is equally dividing the work among all GPUs irrespective of the memory available?

deepakn94 commented 4 years ago

No, that's not true. But why do you need to normalize by batch size to decide whether something fits? I am saying that the batch size is not a knob we try to tune -- given a batch size, we know the computation times and activation sizes, and we can use this information to make placement decisions for this particular batch size

nirandaperera commented 4 years ago

I see your point. So then if I am to run for different batch size, I have to start from the profiler, then optimizer,... etc? Just wanted to verify this.

I was under the impression that the profiler can be reused for any batch size (of a particular model).

deepakn94 commented 4 years ago

Right, that's correct. The computation time doesn't scale linearly with the batch size (throughput itself is a function of batch size), so you would probably want to run the profiler for each batch size anyway to get an accurate timing estimate (you could reuse activation size measurements, but we don't currently do this)...hope this clarifies!

nirandaperera commented 4 years ago

Thanks for the clarification.