Closed msaroufim closed 1 year ago
This code refactoring makes backend handler elegant. The only issue I can see is more related to the communication b/w frontend and backend by using environment variable.
From security perspective, it is not safe by using environment variable as a way to pass configuration b/c environment variable can be easily hacked.
It is not necessary to set env when frontend and backend are in the same process (eg. TS cpp torch::deploy). Do we need rewrite handler again?
Modularize
base_handler.py
intohandler.utils
This is a follow-up note on #1440 where we initially noted a few problems that were making it difficult to add code to TorchServe
Initially, the
base_handler
had a few simple responsibilitiesBut over time, contributors found it easy to extend functionality in torchserve by extending the handler because it involved python-only changes and we've been reluctant to fully embrace this approach since more natively integrating within the
WorkloadManager.java
makes it so you can gate certain functionality behind aconfig.properties
similar to https://github.com/pytorch/serve/pull/1319. However for more complex features the code review process becomes more of a challenge and grinds to a halt https://github.com/pytorch/serve/pull/1401At the same time, we've had
examples/
grow in size and complexity like https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers without having its improvements be easily usable by other parts of the code baseDescribe the solution
Instead of designing things in the abstract, it's important to look at what contributors have actually added to handlers. Here's a non-exhaustive list
DALI
oraccimage
https://github.com/pytorch/serve/pull/1545Our proposal is simple, instead of encouraging one-off contributions in
examples/
or complex contributions to the Java frontend we can create an interface for existinghandler_utils/
, refactor thebase_handler.py
andTransformer_handler_generalized.py
to use them and encourage new contributors inCONTRIBUTING.md
and code review to create their own.Let's do a few examples to make this all clearer
Environment variables now
Environment variables after
The difference is now we can add as many environment variables as we like instead of having tons of
if
conditions inside the handler. Profiling code could then be pulled out of thebase_handler.py
and new profilers could be added with the only constraint on them being that they write something to diskModel optimization now
Model optimization after
The difference now is we create a simpler interface for runtime providers and make it easy to add new kinds of optimizations. Examplee pruning, distillation, quantization all would also follow a similar interface where given an
nn.Module
produce anothernn.Module
.experimental/torchprep
follows a similar design philosophy which makes optimizations more composableThe other application-specific code contributors have added to the
base_handler.py
can be solved in a similar way where instead of favoring one offs. We encourage code reuse with tiny wrapper classes. Things get particularly exciting once we start considering that TorchServe can execute an arbitrary Python script so we could for example also provide apybind
wrapper totorch::deploy
, create a simple wrapper intorch_handler.utils.launch
to start serving PyTorch in a multithreaded manner without touching any Java code.Our contributors like and understand our handlers, it's time to make them a first class citizen.
EDIT: discussed this proposal with @min-jean-cho offline For IPEX a convenient workflow has been to
ConfigManager.java
- for now seems OK if this file is massive we can just split up configs by comments and link toConfigManager.java
in our docs for people to learn about available featureshttp_address
string they also need to support a use case ofThe
custom_code_snippet
can be anywhere we likeFollowup on July 6
We've discussed this item more and environment variables while very convenient have a few problems
So in what follow we will discus the main issue which is "Environment variables cannot be applied to a single model". I'm reopening this discussion because we've had customers privately ask for more native ONNX support, TensorRT support and some more recent asks for MXNet support https://github.com/pytorch/serve/issues/1725
There is one alternative which would involve the backend and frontend communicating in process which @lxning pointed out
Sketch of alternative solution
We should still leverage the
ConfigManager
for general configurations related to TorchServe but use theMODEL_CONFIG
for general model specific configurations. In particular most model optimizations are indeed model specific but things like profiling could be a global environment variable instead. So let's focus on model optimizations in what follows.In this case we have a flag
ipex_enable
that gates whether IPEX is on or off and then some parameters specific to IPEX. In this we should favor a design with hierarchical configurations since it doesn't make sense to specify anipex_dtype
ifipex_enable
is not enabled.Regardless given some configurations set in
ConfigManager
in this manner we can choose to either expose them as environment variables like for example what we do withLogLocation
Model config doesn't have a schema so any value in the JSON should be able to be used by someone downstream as it gets parsed into a hashmap
This makes it ideal to use because noone needs to add any code to support a new parameter in the model
config.properties
. However the currentmodelConfig
only accepts integer values (which is fine since we can pretend all non zero values are a bool of 1)So maybe then instead of environment variables, if we make
modelConfig
available as a dictionary for any handler in thecontext
https://github.com/pytorch/serve/blob/master/ts/context.py#L8 object then we can support new model optimizations by queryingmodelConfig = {...}
Those same parameters need to be made available via the model registration API so model optimization related registrations would go here https://github.com/pytorch/serve/blob/master/frontend/server/src/main/java/org/pytorch/serve/openapi/OpenApiUtils.java#L241
And here https://github.com/pytorch/serve/blob/master/frontend/server/src/main/java/org/pytorch/serve/http/messages/RegisterModelRequest.java#L10
in which case can we make this request object available on the handler? https://github.com/pytorch/serve/blob/master/frontend/server/src/main/java/org/pytorch/serve/http/messages/RegisterModelRequest.java#L74-L81
OTF message handler
After many discussions we feel like the best solution would be to extend our existing protobuf files such that people that they have support for model optimization runtimes and their settings https://github.com/pytorch/serve/blob/master/frontend/server/src/main/resources/proto/management.proto
Because settings vary by runtime provider and are hard to keep backwards compatible we can leverage a protobuf map https://developers.google.com/protocol-buffers/docs/proto3#maps to keep track of each model optimization provider
So the addition to our
.proto
files would beAnd then in the base handler we would unpack each of those maps and set up the correct configuration per the exact same design described in the first section of this doc.
One interesting aspect of using an updated protobuf format is we can actually change model optimizations at runtime instead of setting them statically so people could for example change their ONNX configuration depending on load.