Closed Aprilmhx closed 2 years ago
@Aprilmhx I am not repro the issue, can you pls try the following on your end.
(image_seg) ubuntu@ip-172-31-49-104:~$ torch-model-archiver --model-name fcn_resnet_101 --version 1.0 --model-file examples/image_segmenter/fcn/model.py --serialized-file fcn_resnet101_coco-7ecb50ca.pth --handler image_segmenter --extra-files examples/image_segmenter/fcn/fcn.py,examples/image_segmenter/fcn/intermediate_layer_getter.py
(image_seg) ubuntu@ip-172-31-49-104:~$ mv fcn_resnet_101.mar ~/model_store
(image_seg) ubuntu@ip-172-31-49-104:~$ sudo docker run --rm -it -p 7000:8080 -p 7001:8081 -v $(pwd)/model_store:/home/model-server/model-store pytorch/torchserve:latest
(image_seg) ubuntu@ip-172-31-49-104:~$ curl -X POST "http://127.0.0.1:7001/models?initial_workers=2&url=fcn_resnet_101.mar"
{
"status": "Model \"fcn_resnet_101\" Version: 1.0 registered with 2 initial workers"
}
@HamidShojanazeri Thank you very much for your Reply!
I couldn't run the torch-model-archiver command locally because Torchserve's local environment was not installed successfully, so I pulled the torchserve docker image directly in step 3. After that I ran the command torch-model-archiver inside the docker container and got the .mar file. This file is also stored in the local folder via the bind-mount operation. On the local path I executed the command from step 5 and got "Failed to start workers for model fcn_resnet_101 version: 1.0"
I would like to ask if this works? Thank you in Advance.
This problem has been solved by running on the GPU
@Aprilmhx 你好,我也遇到了跟你一模一样的问题。
This problem has been solved by running on the GPU
根据你的描述,你在GPU上运行解决了这个问题,这个我不是特别理解。如果方便的话,请详细描述一下你的操作细节,谢谢。
For other folks with the same issue: If some any reason you're running based on CPU (e.g. due to the incompatibility with your OS (in my case Ubuntu 18.04) with your CUDA driver) this error will rise.
The solution for me was to move the API to run in a machine with GPU and all drivers aligned with the OS.
I was able to get the workers to start with
curl -vX PUT "http://localhost:8081/models/{model-name}?min_worker=1"
You can also set max_worker
in the same way. I did not verify this on the example model but it seemed to work on my own project. See the management API docs for more info.
I noticed when I inspected my model that both min and max workers were set to 0. After this command I was able to see them set to what I wanted. You can inspect your model and parameters like these with
curl localhost:8081/models/{model-name}
Edit: formatting and more context.
Hello!
After I successfully installed the torchserve environment using docker, I tried the examples/image_segmenter that comes with torchserve, but I was never able to successfully register the model.
My local environment libraries are:
Your Environment
Expected Behavior
Based on the docker environment using examples/image_segmenter in torchserve to successfully register the model and get the return result of persons.jpg
Current Behavior
After registering the model with this command:
sudo curl -X POST "http://127.0.0.1:7001/models?initial_workers=2&url=fcn_resnet_101.mar"
get the Error:
{ "code": 500, "type": "InternalServerException", "message": "Failed to start workers for model fcn_resnet_101 version: 1.0" }
Possible Solution
No idea
Steps to Reproduce
Failure Logs [if any]