pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 858 forks source link

Failing to run resnet_18 example #2348

Closed HakkyuKim closed 1 year ago

HakkyuKim commented 1 year ago

🐛 Describe the bug

I'm trying to run the the torchserve resnet_18 example by following the README.

Environment

local PC, no docker, no GPU
Ubuntu 22.04.2 LTS
python 3.10.6
pip 23.1.2

Steps

After cloning the serve repository,

# Install dependencies.
python3 -m venv venv
python3 ts_scripts/install_dependencies.py

# From the resnet_18 example README...
wget https://download.pytorch.org/models/resnet18-f37072fd.pth
torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json
mkdir model_store
mv resnet-18.mar model_store/
torchserve --start --model-store model_store --models resnet-18=resnet-18.mar

Result

Calling prediction on a different terminal produces the following error.

{
  "code": 500,
  "type": "InternalServerException",
  "message": "Worker died."
}

Error log

2023-05-16T19:51:14,424 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9008, pid=574853
2023-05-16T19:51:14,425 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9004, pid=574852
2023-05-16T19:51:14,425 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9008
2023-05-16T19:51:14,425 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9006, pid=574854
2023-05-16T19:51:14,425 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9005, pid=574855
2023-05-16T19:51:14,425 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9010, pid=574859
2023-05-16T19:51:14,426 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9004
2023-05-16T19:51:14,426 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9006
2023-05-16T19:51:14,426 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9010
2023-05-16T19:51:14,426 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9007, pid=574856
2023-05-16T19:51:14,426 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=574847
2023-05-16T19:51:14,426 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9007
2023-05-16T19:51:14,426 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9005
2023-05-16T19:51:14,426 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9011, pid=574860
2023-05-16T19:51:14,426 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9009, pid=574857
2023-05-16T19:51:14,427 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9009
2023-05-16T19:51:14,428 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9011
2023-05-16T19:51:14,430 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9003, pid=574851
2023-05-16T19:51:14,432 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9002, pid=574849
2023-05-16T19:51:14,432 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9003
2023-05-16T19:51:14,432 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9002
2023-05-16T19:51:14,433 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2023-05-16T19:51:14,438 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,438 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,438 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,439 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - [PID]574856
2023-05-16T19:51:14,439 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - [PID]574857
2023-05-16T19:51:14,439 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,439 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,439 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,440 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,440 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - [PID]574854
2023-05-16T19:51:14,440 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,441 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,441 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,441 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - [PID]574853
2023-05-16T19:51:14,441 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,442 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,442 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,442 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,442 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,442 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - [PID]574860
2023-05-16T19:51:14,442 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - [PID]574859
2023-05-16T19:51:14,443 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - [PID]574849
2023-05-16T19:51:14,443 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,443 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,443 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,443 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,443 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,444 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,447 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,448 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - [PID]574855
2023-05-16T19:51:14,448 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,450 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,451 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,451 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,452 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - [PID]574847
2023-05-16T19:51:14,452 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - [PID]574851
2023-05-16T19:51:14,452 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,452 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,452 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,453 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,455 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9006.
2023-05-16T19:51:14,455 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9009.
2023-05-16T19:51:14,455 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9011.
2023-05-16T19:51:14,455 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9007.
2023-05-16T19:51:14,455 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9008.
2023-05-16T19:51:14,455 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9010.
2023-05-16T19:51:14,455 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9002.
2023-05-16T19:51:14,456 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,456 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9005.
2023-05-16T19:51:14,457 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - [PID]574852
2023-05-16T19:51:14,457 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,457 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9003.
2023-05-16T19:51:14,457 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,457 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2023-05-16T19:51:14,459 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9004.
2023-05-16T19:51:14,471 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9001, pid=574848
2023-05-16T19:51:14,472 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9001
2023-05-16T19:51:14,490 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Successfully loaded /home/hakkyu/.local/lib/python3.10/site-packages/ts/configs/metrics.yaml.
2023-05-16T19:51:14,490 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - [PID]574848
2023-05-16T19:51:14,490 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Torch worker started.
2023-05-16T19:51:14,491 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2023-05-16T19:51:14,517 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9001.
2023-05-16T19:51:14,583 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,584 [INFO ] W-9006-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,584 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,584 [INFO ] W-9004-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,584 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,586 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,587 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,587 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,587 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,587 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,587 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,593 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - model_name: resnet-18, batchSize: 1
2023-05-16T19:51:14,991 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,991 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,991 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,991 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,992 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,992 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,992 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,992 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,992 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,992 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,992 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,992 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,993 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,993 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,993 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2023-05-16T19:51:14,993 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,993 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,991 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,992 [INFO ] W-9010-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,992 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,993 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,993 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,993 [INFO ] W-9002-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,993 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,993 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,993 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,993 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,993 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,992 [INFO ] W-9000-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,993 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,994 [INFO ] W-9005-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,994 [INFO ] W-9003-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2023-05-16T19:51:14,992 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Backend worker process died.
2023-05-16T19:51:14,994 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,994 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,993 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2023-05-16T19:51:14,994 [INFO ] W-9001-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,994 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,994 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,994 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,994 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'image_classifier'
2023-05-16T19:51:14,995 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - 
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - 
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 253, in <module>
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     worker.run_server()
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 221, in run_server
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2023-05-16T19:51:14,996 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 184, in handle_connection
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     service = model_loader.load(
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 102, in load
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 151, in _load_default_handler
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2023-05-16T19:51:14,997 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2023-05-16T19:51:14,998 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/torch_handler/image_classifier.py", line 8, in <module>
2023-05-16T19:51:14,999 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -     from .vision_handler import VisionHandler
2023-05-16T19:51:14,999 [INFO ] W-9011-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/torch_handler/vision_handler.py", line 11, in <module>
2023-05-16T19:51:14,999 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 100, in load
2023-05-16T19:51:14,999 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:14,994 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:14,999 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2023-05-16T19:51:14,999 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2023-05-16T19:51:14,999 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 145, in _load_handler_file
2023-05-16T19:51:14,999 [INFO ] W-9008-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-05-16T19:51:14,999 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2023-05-16T19:51:15,000 [INFO ] W-9007-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'image_classifier'
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - 
2023-05-16T19:51:15,000 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - 
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 253, in <module>
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     worker.run_server()
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 221, in run_server
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 184, in handle_connection
2023-05-16T19:51:15,001 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     service, result, code = self.load_model(msg)
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     service = model_loader.load(
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 102, in load
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     module = self._load_default_handler(handler)
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/model_loader.py", line 151, in _load_default_handler
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name, "ts.torch_handler")
2023-05-16T19:51:15,002 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2023-05-16T19:51:15,003 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/torch_handler/image_classifier.py", line 8, in <module>
2023-05-16T19:51:15,004 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     from .vision_handler import VisionHandler
2023-05-16T19:51:15,004 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -   File "/home/hakkyu/.local/lib/python3.10/site-packages/ts/torch_handler/vision_handler.py", line 11, in <module>
2023-05-16T19:51:15,004 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG -     from captum.attr import IntegratedGradients
2023-05-16T19:51:15,004 [INFO ] W-9009-resnet-18_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'captum'

Attempts to resolve the issue

It seems that mar file requires some dependencies so I've tried the following

pip freeze > requirements.txt
torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json -r requirements.txt
mv resnet-18.mar model_store/
echo "install_py_dep_per_model=true" > config.properties
mv config.properties model_store/
torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --ts-config config.properties

Calling prediction on a different terminal produces different error.

{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "Model \"resnet-18\" has no worker to serve inference request. Please use scale workers API to add workers."
}

By checking the terminal output, there seems to be a problem in downloading torch==2.0.0+cpu.

2023-05-16T20:38:26,067 [INFO ] main org.pytorch.serve.wlm.ModelManager - Dependency installation stdout:
Collecting captum==0.6.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 1))  Using cached captum-0.6.0-py3-none-any.whl (1.3 MB)Collecting certifi==2023.5.7 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 2))  Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)Collecting charset-normalizer==3.1.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 3))  Using cached charset_normalizer-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)Collecting cmake==3.26.3 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 4))  Using cached cmake-3.26.3-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)Collecting contourpy==1.0.7 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 5))  Using cached contourpy-1.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (300 kB)Collecting cycler==0.11.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 6))  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)Collecting Cython==0.29.34 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 7))  Using cached Cython-0.29.34-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)Collecting filelock==3.12.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 8))  Using cached filelock-3.12.0-py3-none-any.whl (10 kB)Collecting fonttools==4.39.4 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 9))  Using cached fonttools-4.39.4-py3-none-any.whl (1.0 MB)Collecting idna==3.4 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 10))  Using cached idna-3.4-py3-none-any.whl (61 kB)Collecting Jinja2==3.1.2 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 11))  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)Collecting kiwisolver==1.4.4 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 12))  Using cached kiwisolver-1.4.4-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)Collecting lit==16.0.3 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 13))  Using cached lit-16.0.3-py3-none-any.whlCollecting MarkupSafe==2.1.2 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 14))  Using cached MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)Collecting matplotlib==3.7.1 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 15))  Using cached matplotlib-3.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)Collecting mpmath==1.3.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 16))  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)Collecting networkx==3.1 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 17))  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)Collecting numpy==1.24.3 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 18))  Using cached numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)Collecting nvidia-cublas-cu11==11.10.3.66 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 19))  Using cached nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)Collecting nvidia-cuda-cupti-cu11==11.7.101 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 20))  Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 21))  Using cached nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)Collecting nvidia-cuda-runtime-cu11==11.7.99 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 22))  Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)Collecting nvidia-cudnn-cu11==8.5.0.96 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 23))  Using cached nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)Collecting nvidia-cufft-cu11==10.9.0.58 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 24))  Using cached nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)Collecting nvidia-curand-cu11==10.2.10.91 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 25))  Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)Collecting nvidia-cusolver-cu11==11.4.0.1 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 26))  Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)Collecting nvidia-cusparse-cu11==11.7.4.91 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 27))  Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)Collecting nvidia-nccl-cu11==2.14.3 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 28))  Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)Collecting nvidia-nvtx-cu11==11.7.91 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 29))  Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)Collecting packaging==23.1 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 30))  Using cached packaging-23.1-py3-none-any.whl (48 kB)Collecting Pillow==9.3.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 31))  Using cached Pillow-9.3.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.3 MB)Collecting psutil==5.9.5 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 32))  Using cached psutil-5.9.5-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282 kB)Collecting pynvml==11.4.1 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 33))  Using cached pynvml-11.4.1-py3-none-any.whl (46 kB)Collecting pyparsing==3.0.9 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 34))  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)Collecting python-dateutil==2.8.2 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 35))  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)Collecting PyYAML==6.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 36))  Using cached PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (682 kB)Collecting requests==2.30.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 37))  Using cached requests-2.30.0-py3-none-any.whl (62 kB)Collecting six==1.16.0 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 38))  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)Collecting sympy==1.12 (from -r /tmp/models/3dcfea7706ed4bdabadd41022a549f89/requirements.txt (line 39))  Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
2023-05-16T20:38:26,068 [ERROR] main org.pytorch.serve.wlm.ModelManager - Dependency installation stderr:
ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cpu (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)ERROR: No matching distribution found for torch==2.0.0+cpu
2023-05-16T20:38:26,068 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: resnet-18.mar
org.pytorch.serve.archive.model.ModelException: Custom pip package installation failed for resnet-18
        at org.pytorch.serve.wlm.ModelManager.setupModelDependencies(ModelManager.java:258) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:152) ~[model-server.jar:?]
        at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:264) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:396) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:118) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.main(ModelServer.java:99) [model-server.jar:?]
2023-05-16T20:38:26,076 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-05-16T20:38:26,114 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-05-16T20:38:26,114 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-05-16T20:38:26,115 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-05-16T20:38:26,115 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-05-16T20:38:26,116 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-05-16T20:38:26,310 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,311 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:3.3284988403320312|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,312 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:217.93718338012695|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,312 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:98.5|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,313 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:5547.75390625|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,313 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:8914.88671875|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106
2023-05-16T20:38:26,313 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:65.1|#Level:Host|#hostname:hakkyu-desktop,timestamp:1684237106

I'm unable to proceed from here, does the example need to be fixed or am I doing something wrong here? (I've also tried the above steps without venv.)

Error logs

Error log has been added above.

Installation instructions

Did you install torchserve from source? Are you using Docker? NO and NO.

Model Packaing

torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json
torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json -r requirements.txt

config.properties

install_py_dep_per_model=true

Versions

Not sure why java version is not included. This is my java environment

java --version
openjdk 17.0.6 2023-01-17
OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 17.0.6+10-Ubuntu-0ubuntu122.04, mixed mode, sharing)
------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.8.0
torch-model-archiver==0.8.0

Python version: 3.10 (64-bit runtime)
Python executable: /usr/bin/python3

Versions of relevant python libraries:
numpy==1.24.2
psutil==5.9.4
requests==2.28.2
requests-oauthlib==1.3.1
torch==2.0.1+cpu
torch-model-archiver==0.8.0
torchaudio==2.0.2+cpu
torchserve==0.8.0
torchvision==0.15.2+cpu
transformers==4.26.1
wheel==0.37.1
torch==2.0.1+cpu
**Warning: torchtext not present ..
torchvision==0.15.2+cpu
torchaudio==2.0.2+cpu

Java Version:

OS: Ubuntu 22.04.2 LTS
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: 14.0.0-1ubuntu1
CMake version: version 3.22.1

Repro instructions

Written above.

Possible Solution

No response

msaroufim commented 1 year ago

In your config.properties you have install_py_dep_per_model=true so make sure you're also passing in dependencies like captum in your requirements.txt

captum strikes again cc @agunapal

HakkyuKim commented 1 year ago

@msaroufim Hi, thanks for replying. I've already passed captum in the dependecies. This is the requirements.txt file that I've passed during mar archiving.

captum==0.6.0
certifi==2023.5.7
charset-normalizer==3.1.0
cmake==3.26.3
contourpy==1.0.7
cycler==0.11.0
Cython==0.29.34
filelock==3.12.0
fonttools==4.39.4
idna==3.4
Jinja2==3.1.2
kiwisolver==1.4.4
lit==16.0.3
MarkupSafe==2.1.2
matplotlib==3.7.1
mpmath==1.3.0
networkx==3.1
numpy==1.24.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
packaging==23.1
Pillow==9.3.0
psutil==5.9.5
pynvml==11.4.1
pyparsing==3.0.9
python-dateutil==2.8.2
PyYAML==6.0
requests==2.30.0
six==1.16.0
sympy==1.12
torch==2.0.0+cpu
torchaudio==2.0.1+cpu
torchdata==0.6.0
torchtext==0.15.1+cpu
torchvision==0.15.1+cpu
tqdm==4.65.0
triton==2.0.0
typing_extensions==4.5.0
urllib3==2.0.2

The current issue is not able to install torch==2.0.0+cpu in worker?

2023-05-16T20:38:26,068 [ERROR] main org.pytorch.serve.wlm.ModelManager - Dependency installation stderr:
ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cpu (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)ERROR: No matching distribution found for torch==2.0.0+cpu
2023-05-16T20:38:26,068 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: resnet-18.mar
org.pytorch.serve.archive.model.ModelException: Custom pip package installation failed for resnet-18
        at org.pytorch.serve.wlm.ModelManager.setupModelDependencies(ModelManager.java:258) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:152) ~[model-server.jar:?]
        at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:264) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:396) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:118) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.main(ModelServer.java:99) [model-server.jar:?]
HakkyuKim commented 1 year ago

torch==2.0.0+cpu had to be downloaded from a different index, https://download.pytorch.org/whl/cpu.

Adding the --extra-index-url line in requirements.txt solved the problem. I'll close this.

# requirements.txt
--extra-index-url https://download.pytorch.org/whl/cpu
...
...
torch==2.0.1+cpu
torch-model-archiver==0.8.0
torch-workflow-archiver==0.2.8
torchaudio==2.0.2+cpu
torchserve==0.8.0
torchvision==0.15.2+cpu
...
...