Open Mak-24 opened 3 years ago
I am getting the exact same error. Basically, if the model container does not respond within 60 seconds, then you get the error. https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-container-response
A customer's model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to the /invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.
Not sure how to solve it yet though
In case you didn't solve it, have you tried pip installing captum in the Dockerfile? It seems that dependency is missing. In Dockerfile
Change:
RUN pip install --no-cache-dir psutil \
--no-cache-dir torch \
--no-cache-dir torchvision
To:
RUN pip install --no-cache-dir psutil \
--no-cache-dir torch \
--no-cache-dir torchvision \
--no-cache-dir captum \
You can also add other libraries there if you need to import them in your model.py
or custom_handler.py
functions
While running the deploy_torchserve.ipynb without editing anything, I encountered an error at block 11.
Error:
ModelError Traceback (most recent call last)