Closed xiangtianyu closed 2 years ago
这种情况下,对应的global pilot里有什么异常日志吗? 我看到有cds reject的情况,pilot侧应该也会记录这种。
此外,你用的global sidecar和global pilot的版本是?(镜像)。
之前的太久远找不到了,刚才复现,global sidecar日志如下:
2021-11-12T09:08:41.250010Z info ads Push debounce stable[2] 373: 100.582673ms since last change, 200.551406ms since last push, full=true
2021-11-12T09:08:41.252162Z info ads XDS: Pushing:2021-11-12T09:08:41Z/1 Services:93 ConnectedEndpoints:0
2021-11-12T09:08:41.536428Z info ads Push debounce stable[3] 2: 100.764849ms since last change, 187.251158ms since last push, full=true
2021-11-12T09:08:41.538054Z info ads XDS: Pushing:2021-11-12T09:08:41Z/2 Services:93 ConnectedEndpoints:0
2021-11-12T09:08:42.995208Z info ads Full push, new service lazyload.mesh-operator.svc.cluster.local
2021-11-12T09:08:43.095877Z info ads Push debounce stable[4] 1: 100.606332ms since last change, 100.606249ms since last push, full=true
2021-11-12T09:08:43.097997Z info ads XDS: Pushing:2021-11-12T09:08:43Z/3 Services:93 ConnectedEndpoints:0
gc 10 @10.232s 1%: 0.26+72+0.005 ms clock, 8.3+0.62/154/207+0.17 ms cpu, 46->47->24 MB, 51 MB goal, 32 P
2021-11-12T09:08:50.052387Z info ads Push Status: {}
2021-11-12T09:09:05.818578Z info ads Full push, new service istio-pilot.mesh-operator.svc.cluster.local
2021-11-12T09:09:05.918895Z info ads Push debounce stable[5] 1: 100.22323ms since last change, 100.223072ms since last push, full=true
2021-11-12T09:09:05.920843Z info ads XDS: Pushing:2021-11-12T09:09:05Z/4 Services:93 ConnectedEndpoints:0
2021-11-12T09:09:10.052619Z info ads Push Status: {}
2021-11-12T09:09:17.336865Z info ads ADS:CDS: REQ sidecar~172.160.0.99~global-sidecar-9bc5bf795-8psv6.istio-public~istio-public.svc.cluster.local-1 version:
2021-11-12T09:09:17.339615Z info ads CDS: PUSH for node:global-sidecar-9bc5bf795-8psv6.istio-public clusters:138 services:93 version:2021-11-12T09:09:05Z/4
2021-11-12T09:09:17.345863Z info ads LDS: PUSH for node:global-sidecar-9bc5bf795-8psv6.istio-public listeners:35
2021-11-12T09:09:36.959672Z info ads EDS: PUSH for node:global-sidecar-9bc5bf795-8psv6.istio-public clusters:117 endpoints:81 empty:48
gc 11 @62.013s 0%: 0.38+94+0.006 ms clock, 12+0.24/540/85+0.22 ms cpu, 46->46->25 MB, 48 MB goal, 32 P
2021-11-12T09:09:40.157462Z info ads RDS: PUSH for node:global-sidecar-9bc5bf795-8psv6.istio-public routes:25
2021-11-12T09:09:40.157561Z warn ads ADS:LDS: ACK ERROR sidecar~172.160.0.99~global-sidecar-9bc5bf795-8psv6.istio-public~istio-public.svc.cluster.local-1 Internal:Error adding/updating listener(s) 0.0.0.0_15021: cannot bind '0.0.0.0:15021': Address already in use
2021-11-12T09:09:41.337129Z info ads Push debounce stable[6] 1: 100.228588ms since last change, 100.228505ms since last push, full=true
2021-11-12T09:09:41.338343Z info ads XDS: Pushing:2021-11-12T09:09:41Z/5 Services:93 ConnectedEndpoints:1
2021-11-12T09:09:50.052350Z info ads Push Status: {}
gc 12 @130.682s 0%: 0.76+121+0.009 ms clock, 24+0.55/723/0.65+0.28 ms cpu, 49->50->26 MB, 51 MB goal, 32 P
2021-11-12T09:12:11.341747Z info ads Push debounce stable[7] 1: 100.567735ms since last change, 100.567662ms since last push, full=true
2021-11-12T09:12:11.343380Z info ads XDS: Pushing:2021-11-12T09:12:11Z/6 Services:93 ConnectedEndpoints:1
gc 13 @214.623s 0%: 0.46+184+0.006 ms clock, 14+0.50/479/154+0.22 ms cpu, 50->51->25 MB, 52 MB goal, 32 P
2021-11-12T09:12:20.052805Z info ads Push Status: {}
2021-11-12T09:12:41.334600Z info ads Push debounce stable[8] 1: 100.233482ms since last change, 100.233388ms since last push, full=true
2021-11-12T09:12:41.336088Z info ads XDS: Pushing:2021-11-12T09:12:41Z/7 Services:93 ConnectedEndpoints:1
2021-11-12T09:12:50.052739Z info ads Push Status: {}
slime-boot日志
{"level":"info","ts":1636708117.1667693,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
{"level":"error","ts":1636708117.516414,"logger":"controller-runtime.manager.controller.slimeboot-controller","msg":"Reconciler error","name":"lazyload","namespace":"mesh-operator","error":"Operation cannot be fulfilled on slimeboots.config.netease.com \"lazyload\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214"}
{"level":"info","ts":1636708118.6415234,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
{"level":"info","ts":1636708119.7781372,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
{"level":"info","ts":1636708179.7753153,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
{"level":"info","ts":1636708240.9074373,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
{"level":"info","ts":1636708302.0466037,"logger":"helm.controller","msg":"Reconciled release","namespace":"mesh-operator","name":"lazyload","apiVersion":"config.netease.com/v1alpha1","kind":"SlimeBoot","release":"lazyload"}
lazyload日志:
time="2021-11-12T09:08:41Z" level=info msg="get virtualService, bookinfo" virtualService=istio-public/bookinfo
time="2021-11-12T09:08:41Z" level=info msg="get destination after parse, map[bookinfo.test.za.net:[productpage]]" virtualService=istio-public/bookinfo
time="2021-11-12T09:08:41Z" level=info msg="get serviceFence, reviews" serviceFence=istio-public/reviews
time="2021-11-12T09:08:41Z" level=info msg="get virtualService, grafana-vs" virtualService=istio-system/grafana-vs
time="2021-11-12T09:08:41Z" level=info msg="get destination after parse, map[grafana.test.za.net:[grafana]]" virtualService=istio-system/grafana-vs
time="2021-11-12T09:08:41Z" level=info msg="get virtualService, kiali-vs" virtualService=istio-system/kiali-vs
time="2021-11-12T09:08:41Z" level=info msg="get destination after parse, map[kiali.test.za.net:[kiali]]" virtualService=istio-system/kiali-vs
time="2021-11-12T09:08:41Z" level=info msg="get virtualService, prometheus-vs" virtualService=istio-system/prometheus-vs
time="2021-11-12T09:08:41Z" level=info msg="get destination after parse, map[prometheus.test.za.net:[prometheus]]" virtualService=istio-system/prometheus-vs
time="2021-11-12T09:08:41Z" level=info msg="get virtualService, tracing-vs" virtualService=istio-system/tracing-vs
time="2021-11-12T09:08:41Z" level=info msg="get destination after parse, map[tracing.test.za.net:[tracing]]" virtualService=istio-system/tracing-vs
time="2021-11-12T09:08:41Z" level=info msg="get serviceFence, productpage" serviceFence=istio-public/productpage
I1112 09:08:41.329437 1 reflector.go:150] Starting reflector *v1alpha3.Sidecar (9h11m38.180536184s) from pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105
I1112 09:08:41.329456 1 reflector.go:185] Listing and watching *v1alpha3.Sidecar from pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105
time="2021-11-12T09:08:41Z" level=info msg="Update a Sidecarin istio-public:productpage"
I1112 09:08:41.429535 1 shared_informer.go:227] caches populated
time="2021-11-12T09:08:41Z" level=info msg="Update a Sidecarin istio-public:productpage"
time="2021-11-12T09:08:41Z" level=info msg="get serviceFence, details" serviceFence=istio-public/details
time="2021-11-12T09:09:41Z" level=info msg="Update a Sidecarin istio-public:productpage"
time="2021-11-12T09:12:11Z" level=info msg="Update a Sidecarin istio-public:productpage"
time="2021-11-12T09:12:41Z" level=info msg="Update a Sidecarin istio-public:productpage"
time="2021-11-12T09:13:41Z" level=info msg="Update a Sidecarin istio-public:productpage"
I1112 09:14:00.226741 1 reflector.go:418] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Watch close - *v1.Namespace total 0 items received
time="2021-11-12T09:14:11Z" level=info msg="Update a Sidecarin istio-public:productpage"
I1112 09:14:27.233785 1 reflector.go:418] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Watch close - *v1.Service total 0 items received
版本配置如下:
---
apiVersion: config.netease.com/v1alpha1
kind: SlimeBoot
metadata:
name: lazyload
namespace: mesh-operator
spec:
image:
pullPolicy: Always
repository: docker.io/slimeio/slime-lazyload
tag: v0.2.6-d808438
module:
- name: lazyload
enable: true
fence:
wormholePort: # replace to your application svc ports
- "9080"
metric:
prometheus:
address: http://prometheus.istio-system:9090
handlers:
destination:
query: |
sum(istio_requests_total{source_app="$source_app",reporter="destination"})by(destination_service)
type: Group
component:
globalSidecar:
enable: true
type: namespaced
namespace:
- istio-public # 替换为bookinfo安装的ns
resources:
requests:
cpu: 200m
memory: 200Mi
limits:
cpu: 200m
memory: 200Mi
image:
repository: istio/proxyv2
tag: 1.7.0
pilot:
enable: true
resources:
requests:
cpu: 200m
memory: 200Mi
limits:
cpu: 200m
memory: 200Mi
image:
repository: docker.io/slimeio/pilot
tag: globalPilot-7.0-v0.0.3-713c611962
ADS:LDS: ACK ERROR sidecar~172.160.0.99~global-sidecar-9bc5bf795-8psv6.istio-public~istio-public.svc.cluster.local-1 Internal:Error adding/updating listener(s) 0.0.0.0_15021: cannot bind '0.0.0.0:15021': Address already in use
在我的环境中,这个错误是service istio-ingressgateway -n istio-system 使用了15021端口,导致envoy lds失败,修改ingress的15021端口为其他值可解决,你可以试下
这个错误不影响启动,事实上正常启动的时候也有报这个错。但是就是有时候偶尔会起不来。
ingress gateway的service port 15021 我们的模块暂时没用,这个动态配置和global-sidecar的静态配置端口冲突了,所以可以把它改成其他值 ` ports:
我找到了一篇文章说明此问题 https://imroc.cc/post/202105/using-istio-reserved-port-causes-pod-start-failed/ 建议改掉ingress svc的15021,作为临时修复方法,后续我们会考虑出一个版本解决这个问题
所以起不来和端口占用有关系吗?大部分时候端口占用一样报错,但是也能很快起起来
不改的话,端口存在竞争关系,一定报错,修改端口可以彻底解决问题。 不改但global-sidecar成功启动的情况,怀疑是full push没有包含ingress 15021信息,所以成功起来了,后续推送的服务包含了ingress 端口冲突,所以后续有报错,但这是个概率问题,改下端口把问题解决掉吧
好的,我实验下
改了端口,确实是没在出现这种情况了
目前可以通过端口规划来workaround,稍后的版本里会摆脱这个局限
在业务namespace下启用lazyload有时候会出现namespace下的global-sidecar起不起来的情况,global-sidecar日志如下:
[X] Configuration Lazy Loading [ ] Http Plugin Management [ ] Adaptive Ratelimit [ ] Slime Boot