istio / old_issues_repo

Deprecated issue-tracking repo, please post new issues or feature requests to istio/istio instead.
37 stars 9 forks source link

Service api show "upstream connect error", until you restart istio-pilot. #366

Closed johnzheng1975 closed 5 years ago

johnzheng1975 commented 6 years ago

Bug: Y (Seems sometimes the new pod ip cannot be updated after new service deployment, until you restart istio-pilot)

Environment: istioctl version 0.7.1 (Istio Auth not enabled ) kubectl version 1.9.5 Network is calico

Raise rate: Raise about one time for one day cluster running. (1 time issue raise / 1000 times new service deployment)

Steps 1 Continue to deploy/run new services

2 Access one service url: https://ing.xxxx.xxx.xxxx.com/store/api/health Expect result: 200 with message {"status":"UP"} Actual result: 503 with message (upstream connect error or disconnect/reset before headers)

3 From Istio-ingrss log, seems pod ip is 10.233.125.40:3000 ... ... [2018-05-30T06:24:52.741Z] "GET /store/api/health HTTP/1.1" 503 UF 0 57 1001 - "10.233.90.192" "curl/7.47.0" "e05ccecb-63b6-9eba-af20-46423538f61e" "ing.xxxx.xxx.xxxx.com" "10.233.125.40:3000" ... ...

4 From app service, seems pod ip is 10.233.82.29:3000, different with step 3 kubectl describe svc hp-store-service -n hp Name: hp-store-service Namespace: hp Labels: app=hp-store-service Annotations: prometheus.io/path=/prometheus prometheus.io/port=9090 prometheus.io/probe=true prometheus.io/scrape=true Selector: app=hp-store Type: ClusterIP IP: 10.233.47.221 Port: http 80/TCP TargetPort: 3000/TCP Endpoints: 10.233.82.29:3000 Session Affinity: None Events:

5 Continue to show this issue in next 1 hour, it cannot recovery

6 I restart istio-pilot test1b@ip-172-31-17-153:~$ kubectl delete pod istio-pilot-67d6ddbdf6-c6xb6 -n istio-system pod "istio-pilot-67d6ddbdf6-c6xb6" deleted

7 Wait one minute, access https://ing.xxxx.xxx.xxxx.com/store/api/health, now it return 200 {"status":"UP"}

8 From istio-pilot new log, seems pod ip turns correct now. ... ... [2018-05-30T06:25:35.991Z] "GET /store/api/health HTTP/1.1" 200 - 0 16 37 36 "10.233.125.0" "curl/7.47.0" "84823019-f6b0-9021-b501-963376ec3516" "ing.xxxx.xxx.xxxx.com" "10.233.82.29:3000" ... ...

louiscryan commented 6 years ago

@mandarjog this sounds similar to

https://github.com/istio/istio/issues/5391

costinm commented 6 years ago

Any logs from pilot ? I don't think it's the same problem - it looks like some (bad) endpoint is sent to envoy, in the other bug envoy wouldn't get any endpoint assignment. While investigating that we found few other cases that could be affected by the same problem - so I would say different behavior but same root cause and likely fixed in 0.8.

There are few additional debug endpoints in 0.8 - including "/debug/endpointz?brief=true" that lists the pilot's view of endpoints, can be used to find if the problem was on ingestion side or on pushing.

mandarjog commented 6 years ago

certainly stale endpoint, which we had seen with missed updates. Symptoms are similar. Like costin says, very likely fixed in 0.8.

louiscryan commented 6 years ago

As @costinm suggests can you upgrade to 0.8 and see if this resolves your issue.

johnzheng1975 commented 5 years ago

Close it since it is old istio version. Actually, both 0.8 and 1.0 still raise 503 issue. However, I will track other places.