Measure impact of JIT decompositions, reconsider the design

zou3519 commented 2 years ago

We landed https://github.com/pytorch/pytorch/pull/84976. What it does is:

We have some decompositions written in Python.
We want to use them from C++, in subsystems like forward-mode AD.
The design we have is to TorchScript the Python so that it is callable from C++.

Unfortunately this has been causing some problems:

TorchScript doesn't work with Python 3.11 so this blocks the Python 3.11 binaries (https://github.com/pytorch/pytorch/pull/85509#pullrequestreview-1117733698)
flaky onnx failures might be related? https://github.com/pytorch/pytorch/issues/85445
We are unsure of how much this adds to the startup time of import torch. Distributed folks may be sensitive to that.

Alternatively, it would not be too difficult to directly call the Python (from C++) instead of relying on TorchScript to shepherd the code through. This may be a better design in the long-term, the question is, should we do anything about this issue in the short term.

cc @soulitzer @malfet

zou3519 commented 2 years ago

As it pertains to 1.13:

the flaky onnx failures appear to be gone
import torch time (on my machine) goes from 1.1s with importing the decomps to 1.0s without

Not sure if that's significant (10%?) but an easy fix is to lazily load the decomps for jvp the first time forward-mode AD is called

soulitzer commented 2 years ago

That sounds pretty reasonable and seems easy to do indeed

zou3519 commented 2 years ago

https://github.com/pytorch/pytorch/pull/85989 resolved the immediate item for 1.13, though we should still reconsider the design because it uses TorchScript

pytorch / pytorch

Measure impact of JIT decompositions, reconsider the design #85513