I'm attempting to set up a k8s cluster on GKE using the tutorial provided here. However, I found that a lot of the yaml files needed to be reworked, so I made some modifications and deployed it. I've also tested that torchserve is running locally from inside the pod by curl'ing localhost. Frustratingly, the ports exposed by the LoadBalancer are refusing any attempts to connect to it. I don't believe this is a fault of any GCP config on my end because I've been able to get the GKE quickstart deploy serving on the same cluster but through a different LoadBalancer.
The external ip is 35.192.6.11 - it'll be up for a while as I try to figure this out. Luckily the google has discounted the c3 instances to near nothing for the public preview.
cd serve/kubernetes/GKE
gcloud config set compute/region us-west1
gcloud config set compute/zone us-west1-a
gcloud compute disks create --size=200GB --zone=us-west1-a nfs-disk
gcloud container clusters create torchserve --machine-type c3-standard-4 --num-nodes 2
gcloud container clusters get-credentials openmind
cd GKE
helm install mynfs ./nfs-provisioner/
kubectl get svc -n default mynfs-nfs-provisioner -o jsonpath='{.spec.clusterIP}'
# Copy it the ip address over, and then...
kubectl apply -f templates/pv_pvc.yaml -n default
kubectl apply -f templates/pod.yaml
kubectl exec --tty pod/model-store-pod -- mkdir /pv/model-store/
kubectl cp ./trocr-handwritten.mar model-store-pod:/pv/model-store/trocr-handwritten.mar
kubectl exec --tty pod/model-store-pod -- mkdir /pv/config/
kubectl cp ./config.properties model-store-pod:/pv/config/config.properties
kubectl exec --tty pod/model-store-pod -- ls -lR /pv/
kubectl delete po model-store-pod
cd ../Helm
helm install ts .
Possible Solution
nmap output:
$ nmap -Pn 35.192.6.11
Starting Nmap 7.80 ( https://nmap.org ) at 2023-03-20 17:38 EDT
Nmap scan report for 11.6.192.35.bc.googleusercontent.com (35.192.6.11)
Host is up (0.031s latency).
Not shown: 997 filtered ports
PORT STATE SERVICE
8080/tcp closed http-proxy
8081/tcp closed blackice-icecap
8082/tcp closed blackice-alerts
Nmap done: 1 IP address (1 host up) scanned in 8.87 seconds
- VPN: I'm getting an empty response from the curl to port 8080 when I am connected to a vpn, but connection refused when I'm on my home network.
- VPC Firewall changes: allowing connection to `http-server` and `https-server` on ports 8080-8082 with priority 0.
- Configuration changes: reinstalling, changing region, changing compute resource
I believe the fault might lie with some of the configurations in the .yaml files (perhaps the loadbalancer service?) since everything else seems to be working fine, but I haven't been able to sniff it out.
🐛 Describe the bug
I'm attempting to set up a k8s cluster on GKE using the tutorial provided here. However, I found that a lot of the yaml files needed to be reworked, so I made some modifications and deployed it. I've also tested that torchserve is running locally from inside the pod by curl'ing localhost. Frustratingly, the ports exposed by the LoadBalancer are refusing any attempts to connect to it. I don't believe this is a fault of any GCP config on my end because I've been able to get the GKE quickstart deploy serving on the same cluster but through a different LoadBalancer.
The external ip is
35.192.6.11
- it'll be up for a while as I try to figure this out. Luckily the google has discounted the c3 instances to near nothing for the public preview.I've uploaded my torchserve handler and config files and my GKE config files with a series of steps to reproduce.
Thank you very much for taking the time to look over this!!!!
Error logs
curl: (7) Failed to connect to port 8080 after 33 ms: Connection refused
Installation instructions
Install torchserve from source: no Using docker: no
Model Packaing
https://github.com/samuelzxu/trocr-serving
config.properties
default_workers_per_model=1
Versions
Repro instructions
Possible Solution
Nmap done: 1 IP address (1 host up) scanned in 8.87 seconds