pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 855 forks source link

Torchscripted model doesn't load if model file is also specified #2481

Open agunapal opened 1 year ago

agunapal commented 1 year ago

🐛 Describe the bug

This is a negative testcase. Specify model file with torchscripted model

torch-model-archiver --model-name vgg16 --version 1.0  --serialized-file vgg16.pt --extra-files ./examples/image_classifier/index_to_name.json --handler ./examples/image_classifier/vgg_16/vgg_handler.py --model-file ./examples/image_classifier/vgg_16/model.py

On starting torchserve , we get the following error

2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/torch_handler/vision_handler.py", line 23, in initialize
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     super().initialize(context)
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/torch_handler/base_handler.py", line 157, in initialize
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     self.model = self._load_pickled_model(
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/tmp/models/2f1a20f2ebe541258c623be6a203a60b/vgg_handler.py", line 29, in _load_pickled_model
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     model.load_state_dict(state_dict)
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1994, in load_state_dict
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG - TypeError: Expected state_dict to be dict-like, got <class 'torch.jit._script.RecursiveScriptModule'>.
2023-07-19T18:35:41,411 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED

Error logs

2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/torch_handler/vision_handler.py", line 23, in initialize
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     super().initialize(context)
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/ts/torch_handler/base_handler.py", line 157, in initialize
2023-07-19T18:35:41,373 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     self.model = self._load_pickled_model(
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/tmp/models/2f1a20f2ebe541258c623be6a203a60b/vgg_handler.py", line 29, in _load_pickled_model
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     model.load_state_dict(state_dict)
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -   File "/home/ubuntu/anaconda3/envs/torchserve/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1994, in load_state_dict
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG -     raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
2023-07-19T18:35:41,374 [INFO ] W-9000-vgg16_1.0-stdout MODEL_LOG - TypeError: Expected state_dict to be dict-like, got <class 'torch.jit._script.RecursiveScriptModule'>.
2023-07-19T18:35:41,411 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED

Installation instructions

Latest

Model Packaing

torch-model-archiver --model-name vgg16 --version 1.0  --serialized-file vgg16.pt --extra-files ./examples/image_classifier/index_to_name.json --handler ./examples/image_classifier/vgg_16/vgg_handler.py --model-file ./examples/image_classifier/vgg_16/model.py

config.properties

No response

Versions

0.8.1

Repro instructions

Mentioned above

Possible Solution

Change the order of processing in base handler

msaroufim commented 1 year ago

This seems like the correct behavior to me, maybe a better error message would help though?