Closed rbavery closed 1 year ago
I'm trying to find the compatibility matrix for this GPU. latest torchserve uses PyTorch 2.0 which uses CUDA 11.7/11.8 . I'm not sure if its supported on this old GPU. Will confirm
looks like a K80 has Kepler sm_37 arch
and that this should be deprecated in CUDA 11.7 and 11.8.
I'll try out a newer arch
Hi @rbavery Wondering if your issue is resolved.
yes I think this was because of a mismatch between the K80 and the nvidia driver I was using.
🐛 Describe the bug
I'm running https://github.com/developmentseed/segment-anything-services and have gotten to the point where I can run both the encoder model and decoder model locally without issues. However, when I deploy these on AWS Elastic Container Services, I get inexplicable Worker Died errors. I'm using a p2.xlarge.
First thing I'd like to check, is there a limitation with torchserve for what GPU types we can use? is p2.xlarge too old at this point?
Error logs
Installation instructions
using docker https://github.com/developmentseed/segment-anything-services/blob/main/Dockerfile-gpu
Model Packaing
https://github.com/developmentseed/segment-anything-services/blob/main/Dockerfile-build
see readme for instructions on running the .mar build step and the torchserve services: https://github.com/developmentseed/segment-anything-services
config.properties
https://github.com/developmentseed/segment-anything-services/blob/main/deployment/config_encode.properties
Versions
we're using
FROM pytorch/torchserve:latest-gpu
in https://github.com/developmentseed/segment-anything-services/blob/main/Dockerfile-gpuRepro instructions
https://github.com/developmentseed/segment-anything-services
unfortunately this only happens on AWS ECR. so repro is a bit tricky. You can see the cloudformation stack and run this on your own by hooking it up to an aws account. Or, any suggestions on what might need to be specified in the config to ge tthis to work on ECR are appreciated.
Possible Solution
I'll try different gpu instances, but I haven't seen any logs indicating the K80 gpus of the p2.xlarge are not supported by torchserve:latest. Suggestions appreciated!