Open liangs6212 opened 1 year ago
Two possible solutions are shown below:
Chronos:
def predict_with_jit():
...
with torch.jit.optimized_execution(False):
return _pytorch_fashion_inference(model, ...)
Nano: Add optimized_execution
to Contextmanager
.
Adapted from: https://github.com/pytorch/pytorch/blob/master/torch/jit/_fuser.py#L7
class BaseContextmanager:
def __init__(shoule_optimize=False):
self.stored_flag = torch._C._get_graph_executor_optimize()
self.should_optimize = should_optimize
...
def enter():
self.no_grad.__enter__()
torch._C._set_graph_executor_optimize(self.should_optimize)
...
def exit():
torch._C._set_graph_executor_optimize(self.stored_flag)
self.no_grad.__exit__(exc_type, exc_value, exc_tb)
...
I think we could provide a "warmed-up" model to our users once uses call InferenceOptimizer.trace
with accelerator="jit". We do this in InferenceOptimizer.optimize
while not in InferenceOptimizer.trace
.
I am not really sure if we need to add this torch.jit.optimized_execution(False):
in our context manager since it is really bad documented
I think we could provide a "warmed-up" model to our users once uses call
InferenceOptimizer.trace
with accelerator="jit". We do this inInferenceOptimizer.optimize
while not inInferenceOptimizer.trace
.
btw, when the user calls load
, it will change back to the "unwarmed-up" model.
I am not really sure if we need to add this
torch.jit.optimized_execution(False):
in our context manager since it is really bad documented
What about chronos
? I think forecaster
needs this torch.jit.optimized_execution(False)
.
I am not really sure if we need to add this
torch.jit.optimized_execution(False):
in our context manager since it is really bad documentedWhat about
chronos
? I thinkforecaster
needs thistorch.jit.optimized_execution(False)
.
I think put it to chronos is very reasonable since we focus on a very specific solution(model) for each chronos forecaster
The initial loading of Torch Script models requires a long warm-up time, especially for the first and second calls.
To Reproduce
Server: i9-7900(ubuntu-22.04 LTS) Version: Python 3.7.13 Pytorch: 1.12.1 Lightning AI: 1.6.4
After comparing the inference times, the first two inference times are longer than the subsequent inference times. This problem occurs whenever the user calls
trace
orload
and is not described in the torch's documentation. some similar issue: https://github.com/triton-inference-server/pytorch_backend/pull/24/files https://github.com/pytorch/pytorch/issues/57894Solution
torch.jit.optimize_execution
(False) gets around this problem, but it is not mentioned in the documentation.In terms of results,
optimize_execution
also causes a small performance loss. Do we need to fix this? It seems like all of ourforecaster
have this problem, not just theautoformer
. @TheaperDeng @rnwang04 @plusbang