higress-group / higress-standalone

Other
39 stars 6 forks source link

独立部署,gateway启动报错 #11

Closed lilin1996 closed 11 months ago

lilin1996 commented 1 year ago

部署方式: image

脚本执行后,各进程状态 image

错误日志 2023-07-22T03:12:44.703396Z error Epoch 0 exited with error: signal: aborted 2023-07-22T03:12:44.703535Z info No more active epochs, terminating 2023-07-22T03:13:47.287394Z info FLAG: --concurrency="0" 2023-07-22T03:13:47.287493Z info FLAG: --domain="higress-system.svc.cluster.local" 2023-07-22T03:13:47.287536Z info FLAG: --help="false" 2023-07-22T03:13:47.287572Z info FLAG: --localTime="false" 2023-07-22T03:13:47.287608Z info FLAG: --log_as_json="false" 2023-07-22T03:13:47.287643Z info FLAG: --log_caller="" 2023-07-22T03:13:47.287681Z info FLAG: --log_output_level="all:info" 2023-07-22T03:13:47.287714Z info FLAG: --log_rotate="" 2023-07-22T03:13:47.288593Z info FLAG: --log_rotate_max_age="30" 2023-07-22T03:13:47.288628Z info FLAG: --log_rotate_max_backups="1000" 2023-07-22T03:13:47.288667Z info FLAG: --log_rotate_max_size="104857600" 2023-07-22T03:13:47.288734Z info FLAG: --log_stacktrace_level="default:none" 2023-07-22T03:13:47.288883Z info FLAG: --log_target="[stdout]" 2023-07-22T03:13:47.288923Z info FLAG: --meshConfig="./etc/istio/config/mesh" 2023-07-22T03:13:47.288955Z info FLAG: --outlierLogPath="" 2023-07-22T03:13:47.289055Z info FLAG: --proxyComponentLogLevel="misc:error" 2023-07-22T03:13:47.289082Z info FLAG: --proxyLogLevel="warning" 2023-07-22T03:13:47.289508Z info FLAG: --serviceCluster="higress-gateway" 2023-07-22T03:13:47.289595Z info FLAG: --stsPort="0" 2023-07-22T03:13:47.289619Z info FLAG: --templateFile="" 2023-07-22T03:13:47.289643Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange" 2023-07-22T03:13:47.289687Z info FLAG: --vklog="0" 2023-07-22T03:13:47.289723Z info Version 1.12-dev-d4dbaba760bd3869d87560be4988cbd99baf09bd-Clean 2023-07-22T03:13:47.290785Z info Proxy role ips=[172.22.0.6] type=router id=. domain=higress-system.svc.cluster.local 2023-07-22T03:13:47.295008Z info Apply mesh config from file accessLogEncoding: TEXT accessLogFile: /dev/stdout accessLogFormat: | {"authority":"%REQ(:AUTHORITY)%","bytes_received":"%BYTES_RECEIVED%","bytes_sent":"%BYTES_SENT%","downstream_local_address":"%DOWNSTREAM_LOCAL_ADDRESS%","downstream_remote_address":"%DOWNSTREAM_REMOTE_ADDRESS%","duration":"%DURATION%","istio_policy_status":"%DYNAMIC_METADATA(istio.mixer:status)%","method":"%REQ(:METHOD)%","path":"%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%","protocol":"%PROTOCOL%","request_id":"%REQ(X-REQUEST-ID)%","requested_server_name":"%REQUESTED_SERVER_NAME%","response_code":"%RESPONSE_CODE%","response_flags":"%RESPONSE_FLAGS%","route_name":"%ROUTE_NAME%","start_time":"%START_TIME%","trace_id":"%REQ(X-B3-TRACEID)%","upstream_cluster":"%UPSTREAM_CLUSTER%","upstream_host":"%UPSTREAM_HOST%","upstream_local_address":"%UPSTREAM_LOCAL_ADDRESS%","upstream_service_time":"%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%","upstream_transport_failure_reason":"%UPSTREAM_TRANSPORT_FAILURE_REASON%","user_agent":"%REQ(USER-AGENT)%","x_forwarded_for":"%REQ(X-FORWARDED-FOR)%"} configSources:

2023-07-22T03:13:47.318563Z info Effective config: binaryPath: /usr/local/bin/envoy configPath: ./etc/istio/proxy controlPlaneAuthPolicy: MUTUAL_TLS disableAlpnH2: true discoveryAddress: pilot:15012 drainDuration: 45s parentShutdownDuration: 60s proxyAdminPort: 15000 proxyStatsMatcher: inclusionRegexps:

2023-07-22T03:13:47.318644Z info Using existing certs 2023-07-22T03:13:47.397824Z info CA Endpoint pilot:15012, provider Citadel 2023-07-22T03:13:47.398048Z info Opening status port 15020 2023-07-22T03:13:47.434911Z info Using CA pilot:15012 cert with certs: /etc/certs/root-cert.pem 2023-07-22T03:13:47.437567Z info citadelclient Citadel client using custom root cert: pilot:15012 2023-07-22T03:13:48.439579Z info ads All caches have been synced up in 1.1740555s, marking server ready 2023-07-22T03:13:48.454201Z info sds SDS server for workload certificates started, listening on "etc/istio/proxy/SDS" 2023-07-22T03:13:48.461782Z info sds Starting SDS grpc server 2023-07-22T03:13:48.462071Z info xdsproxy Initializing with upstream address "pilot:15012" and cluster "" 2023-07-22T03:13:48.519394Z info Pilot SAN: [pilot] 2023-07-22T03:13:48.520673Z info starting Http service at 127.0.0.1:15004 2023-07-22T03:13:48.589056Z info Pilot SAN: [pilot] 2023-07-22T03:13:48.938948Z info Starting proxy agent 2023-07-22T03:13:48.939053Z info Epoch 0 starting 2023-07-22T03:13:48.955414Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --drain-strategy immediate --parent-shutdown-time-s 60 --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error] 2023-07-22T03:13:48.987769Z info cache adding watcher for file certificate etc/certs/cert-chain.pem 2023-07-22T03:13:48.987850Z info cache read certificate from file resource=default 2023-07-22T03:13:49.017855Z info cache adding watcher for file certificate etc/certs/root-cert.pem 2023-07-22T03:13:49.017953Z info cache read certificate from file resource=ROOTCA [Envoy (Epoch 0)] [2023-07-22 03:13:49.521][15][critical][assert] assert failure: rc == 0. [Envoy (Epoch 0)] [2023-07-22 03:13:49.521][15][critical][backtrace] Caught Aborted, suspect faulting address 0xf [Envoy (Epoch 0)] [2023-07-22 03:13:49.521][15][critical][backtrace] Backtrace (use tools/stack_decode.py to get line numbers): [Envoy (Epoch 0)] [2023-07-22 03:13:49.521][15][critical][backtrace] Envoy version: 4ad0eba4dd5f63b10260495f263ab3971326b4f5/1.20.0/Clean/RELEASE/BoringSSL [Envoy (Epoch 0)] [2023-07-22 03:13:49.522][15][critical][backtrace] #0: [0x7f0856c10520] 2023-07-22T03:13:49.534776Z error Epoch 0 exited with error: signal: aborted 2023-07-22T03:13:49.534846Z info No more active epochs, terminating

环境 image

CH3CHO commented 1 year ago

这个问题我们看看能不能找台类似的机器测一下

lilin1996 commented 1 year ago

944333c44eb3f4d7428e88162c4df8ed 独立部署这个问题,应该不是偶现,在其他Linux版本也出现了相同的问题

johnlanni commented 1 year ago

@lilin1996 有产生cordump文件吗,可以gdb看一下,我怀疑是内核版本不兼容导致的,我后面打个centos7下的镜像

lilin1996 commented 1 year ago

image image 这两个可以给你提供信息吗

johnlanni commented 1 year ago

image image 这两个可以给你提供信息吗

得在容器里 gdb,可以先docker run -d --entrypoint bash 启动,然后把core文件copy进容器里,再 gdb -c core.14 /usr/local/bin/envoy

lilin1996 commented 1 year ago

容器会自动退出,是不是需要用源码debug?

johnlanni commented 1 year ago

@lilin1996 docker run -d --entrypoint /bin/bash

用这个方式覆盖掉 entrypoint,就不会退出了

lilin1996 commented 1 year ago

image 这个信息可以提供你们定位问题的思路吗

lilin1996 commented 1 year ago

image

johnlanni commented 1 year ago

image pthread_create 失败了,我怀疑是docker在centos7下的权限问题,参考:https://github.com/containers/skopeo/issues/1501

johnlanni commented 11 months ago

根据社区反馈,docker升级到20.10.21以上版本可以解决

lcfang commented 11 months ago

根据社区反馈,docker升级到20.10.21以上版本可以解决

helm安装会有相同的问题么?不知道这块儿社区有摸底么?

johnlanni commented 11 months ago

@lcfang K8s下部署不会遇到这个问题,K8s把这些底层问题都封装处理了

lcfang commented 11 months ago

@lcfang K8s下部署不会遇到这个问题,K8s把这些底层问题都封装处理了

那感觉不是很理解,如果k8s的版本使用的默认runtime是docker的话,应该是会有相同的问题才对,除非默认是containerd或者其他吧?

johnlanni commented 11 months ago

@lcfang K8s下部署不会遇到这个问题,K8s把这些底层问题都封装处理了

那感觉不是很理解,如果k8s的版本使用的默认runtime是docker的话,应该是会有相同的问题才对,除非默认是containerd或者其他吧?

K8s并没有完整使用docker的全部功能

johnlanni commented 11 months ago

你试试有没有办法设置docker的 --privileged=true,开启特权模式,或者通过给执行docker命令的user group绑定sudo权限,对这个问题我目前没有仔细深入研究,先从权限相关配置试试

lcfang commented 11 months ago

你试试有没有办法设置docker的 --privileged=true,开启特权模式,或者通过给执行docker命令的user group绑定sudo权限,对这个问题我目前没有仔细深入研究,先从权限相关配置试试

试了下,在compose/docker-compose.yml里面的gateway部分添加privileged: true就可以啦。 image