pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.19k stars 855 forks source link

example is out of date, running end to end results in model not found: alexnet #2018

Open rbavery opened 1 year ago

rbavery commented 1 year ago

📚 The doc issue

https://github.com/pytorch/serve/tree/master/examples/image_classifier/alexnet is out of date or there is a bug in torchserve

when I follow the instructions to

  1. clone the repo
  2. run commands from ./serve
  3. run these commands referenced at the link
wget https://download.pytorch.org/models/alexnet-owt-7be5be79.pth
torch-model-archiver --model-name alexnet --version 1.0 --model-file ./serve/examples/image_classifier/alexnet/model.py --serialized-file alexnet-owt-7be5be79.pth --handler image_classifier --extra-files ./serve/examples/image_classifier/index_to_name.json
mkdir model_store
mv alexnet.mar model_store/
torchserve --start --model-store model_store --models alexnet=alexnet.mar
curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classifier/kitten.jpg

I get this error

# root at rave in ~/serve on git:master ✖︎ [13:26:56]
→ curl http://127.0.0.1:8080/predictions/alexnet -T ./examples/image_classifier/kitten.jpg
2022-12-01T13:27:16,651 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:40832 "PUT /predictions/alexnet HTTP/1.1" 404 10
2022-12-01T13:27:16,651 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests4XX.Count:1|#Level:Host|#hostname:rave,timestamp:1669930036
{
  "code": 404,
  "type": "ModelNotFoundException",
  "message": "Model not found: alexnet"
}
# root at rave in ~/serve on git:master ✖︎ [13:31:53]
→ torchserve --version
TorchServe Version is 0.6.1

torch model archiver version is 0.6.1

Suggest a potential alternative/fix

fix this and other examples so the examples run end to end with recent versions of torchserve and torch model archiver.

lxning commented 1 year ago

@rbavery I can not reproduce your error.

 torch-model-archiver --model-name alexnet --version 1.0 --model-file ./serve/examples/image_classifier/alexnet/model.py --serialized-file alexnet-owt-7be5be79.pth --handler image_classifier --extra-files ./serve/examples/image_classifier/index_to_name.json

mv alexnet.mar serve/model_store

torchserve --start --model-store model_store --models alexnet=alexnet.mar
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-12-01T17:47:27,388 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-12-01T17:47:27,433 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-12-01T17:47:27,707 [INFO ] main org.pytorch.serve.ModelServer -
......

 python_env % curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classifier/kitten.jpg
{
  "tabby": 0.3184734284877777,
  "tiger_cat": 0.2579401135444641,
  "Egyptian_cat": 0.2425481379032135,
  "lynx": 0.16879351437091827,
  "tiger": 0.0064879353158175945
}%
rbavery commented 1 year ago

@lxning I'll test again. Both @weiji14 and I ran into the same issue with this demo last week, following the same steps.

weiji14 commented 1 year ago

Yeah, it's still not working for me either. These are the steps I tried:

git clone https://github.com/pytorch/serve.git  # commit 8d877b0f75f4c9cd899ed02669b9278f019013a1
mamba create --name torchserve -c pytorch python=3.9 torchserve=0.6.1 torch-model-archiver=0.6.1 torch-workflow-archiver=0.2.5
mamba activate torchserve

wget https://download.pytorch.org/models/alexnet-owt-7be5be79.pth
torch-model-archiver --model-name alexnet --version 1.0 --model-file ./serve/examples/image_classifier/alexnet/model.py --serialized-file alexnet-owt-7be5be79.pth --handler image_classifier --extra-files ./serve/examples/image_classifier/index_to_name.json
mkdir model_store
mv alexnet.mar model_store/
torchserve --start --model-store model_store --models alexnet=alexnet.mar
curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classifier/kitten.jpg

Visiting https://127.0.0.1:8080 shows this output. @rbavery's got a 404, but I got a 405 :shrug: The error seems similar to #1110.

{
  "code": 405,
  "type": "MethodNotAllowedException",
  "message": "Requested method is not allowed, please refer to API document."
}
lxning commented 1 year ago

@rbavery @weiji14 The code you got are different.

"404: ModelNotFoundException": indicates that the model is not loaded successfully. @rbavery please check ts_log.log to find the error about model loading.

"405: MethodNotAllowedException": indicates that the REST API(ie. https://127.0.0.1:8080) is not supported. @weiji14 I use conda directly, not use manba. Could you please try the following?

conda create  -y -n py38 python=3.8
conda activate py38