Closed nirandaperera closed 4 years ago
Why do they need to be? Right now, the optimizer is not trying to sweep batch size? It assumes that this is provided as an input
So the optimizer is equally dividing the work among all GPUs irrespective of the memory available?
No, that's not true. But why do you need to normalize by batch size to decide whether something fits? I am saying that the batch size is not a knob we try to tune -- given a batch size, we know the computation times and activation sizes, and we can use this information to make placement decisions for this particular batch size
I see your point. So then if I am to run for different batch size, I have to start from the profiler, then optimizer,... etc? Just wanted to verify this.
I was under the impression that the profiler can be reused for any batch size (of a particular model).
Right, that's correct. The computation time doesn't scale linearly with the batch size (throughput itself is a function of batch size), so you would probably want to run the profiler for each batch size anyway to get an accurate timing estimate (you could reuse activation size measurements, but we don't currently do this)...hope this clarifies!
Thanks for the clarification.
Hi, I see that the profiler calculates memory and execution times based on a particular batch size. But the optimizer code does not take in any batch size parameter. So, does that mean, in the optimizer logic, the execution times and activation memories are normalized?
Best