microsoft / ADBench

Benchmarking various AD tools.

MIT License

102 stars 38 forks source link

Accelerating PyTorch GMM objective #171

Closed mikhailnikolaev closed 4 years ago

mikhailnikolaev commented 4 years ago

GMM objective for PyTorch is refactored with accelerating.

Note: plots for the tools other than Tensorflow, Manual and PyTorch can be wrong.

Examples of the old graphs

Examples of the new graphs

tomjaguarpaw commented 4 years ago

Wow, that is very impressive.

awf commented 4 years ago

Folks, I wonder if we might imagine including a version of the older code with a title like "PyTorch (naive)". Just as we have a few C++ variants, we could imagine a couple of PyTorch variants.

tomjaguarpaw commented 4 years ago

Yes, I think that sounds like a good idea.

mikhailnikolaev commented 4 years ago

Folks, I wonder if we might imagine including a version of the older code with a title like "PyTorch (naive)". Just as we have a few C++ variants, we could imagine a couple of PyTorch variants.

But it is still the PyTorch tool, but written in more effective way. Having "naive" and usual PyTorch tools can lead to misunderstanding what the "naive" version stands for, because it is just less effective code, not more than that. E.g. I can write much less effective code, will it be another version of PyTorch ("very naive") or not? In C++ libraries we use (as far as I understand) different libraries (e.g. Eigen) or different ways of using of them (e.g. Eigen and EigenVector), but here we use definetly the same tool and the same scenario. I don't think it should be show as another tool

awf commented 4 years ago

I think the analogy to "EigenVector" and "Eigen" is appropriate. In EigenVector, the author was trying to use the vectorization capabilities of Eigen, in a way that made sense in 2015, but today makes little difference. Similarly, we could imagine asking whether or not "einsum" is responsible for the speedups here.

The specific case for "naive" is that we are not trying to arbitrarily make less efficient code -- this is the code we had when we first wrote the PyTorch versions. In good faith, we created something that was 100x slower than what we now know we can achieve. I believe this is an outcome that is reflected in other use cases, and which it is worth being able to document.

Of course, we could just show old and new graphs side by side, but that always leaves open the questions of calibration across runs and architectures.

awf commented 4 years ago

Just looping back on this -- is the einsum fragment responsible for much of the speedup?