yunionio / cloudpods

A cloud-native open-source unified multi-cloud and hybrid-cloud platform. 开源、云原生的多云管理及混合云融合平台
https://www.cloudpods.org
Apache License 2.0
2.57k stars 526 forks source link

[求助/Help]3.10.11负载均衡使用问题 #19362

Closed chenjacken closed 4 months ago

chenjacken commented 8 months ago

1,版本: v3.10.11 高可用新集群部署,参考:https://www.cloudpods.org/docs/getting-started/full/ha-ce

部署了负载均衡3个节点:

[root@master1 ~]# kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
lbagent1   Ready      <none>   20d   v1.15.12
lbagent2   Ready      <none>   20d   v1.15.12
lbagent3   Ready      <none>   20d   v1.15.12
master1    Ready      master   26d   v1.15.12
master2    Ready      master   26d   v1.15.12
master3    Ready      master   26d   v1.15.12

2,使用遇到的问题

1)转发策略问题:

image

举个例子:转发策略添加内容如上,点击确认后提示:

image

2)监控功能报错:VictoriaMetrics invalid response

image

点击“监控”

报错1

UnclassifiedError
doQuery with input {"from":"now-1h","interval":"72s","metric_query":[{"from":"now-1h","model":{"database":"telegraf","group_by":[{"params":["$interval"],"type":"time"},{"params":["none"],"type":"fill"}],"interval":"72s","measurement":"haproxy","select":[[{"params":["bin"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["*8"],"type":"math"},{"params":["in_bps"],"type":"alias"}],[{"params":["bout"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["*8"],"type":"math"},{"params":["out_bps"],"type":"alias"}]],"tags":[{"condition":"and","key":"pxname","operator":"=","value":"37afabe2-96ec-432d-83d9-95ca3a3b318f"},{"condition":"and","key":"svname","operator":"=","value":"FRONTEND"}]},"to":"now"}],"scope":"system","show_meta":false,"signature":"ba9cb25dd0915a5b40587ac4d7e832516032efaa79e741567ca62b8eafbea720","skip_check_series":false,"to":"now","unit":true}: ExecuteQuery: queryTSDB: query.executeQuery from victoria-metrics: metricQuery HandleRequest: convert influxQL to promQL: TranslateWithTimeRange: SELECT non_negative_derivative(mean("bin")) *8 AS "in_bps", non_negative_derivative(mean("bout")) *8 AS "out_bps" FROM "haproxy" WHERE ("pxname" = '37afabe2-96ec-432d-83d9-95ca3a3b318f' and "svname" = 'FRONTEND') AND time > now() - 1h GROUP BY time(1m) fill(none): Translate: translate field non_negative_derivative(mean(bin)) * 8 AS in_bps: getMetricName: field.Expr &influxql.BinaryExpr{Op:21, LHS:(*influxql.Call)(0xc001cfefc0), RHS:(*influxql.IntegerLiteral)(0xc001de7588)} is not supported
{
  "class": "UnclassifiedError",
  "code": 500,
  "details": "doQuery with input {\"from\":\"now-1h\",\"interval\":\"72s\",\"metric_query\":[{\"from\":\"now-1h\",\"model\":{\"database\":\"telegraf\",\"group_by\":[{\"params\":[\"$interval\"],\"type\":\"time\"},{\"params\":[\"none\"],\"type\":\"fill\"}],\"interval\":\"72s\",\"measurement\":\"haproxy\",\"select\":[[{\"params\":[\"bin\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"*8\"],\"type\":\"math\"},{\"params\":[\"in_bps\"],\"type\":\"alias\"}],[{\"params\":[\"bout\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"*8\"],\"type\":\"math\"},{\"params\":[\"out_bps\"],\"type\":\"alias\"}]],\"tags\":[{\"condition\":\"and\",\"key\":\"pxname\",\"operator\":\"=\",\"value\":\"37afabe2-96ec-432d-83d9-95ca3a3b318f\"},{\"condition\":\"and\",\"key\":\"svname\",\"operator\":\"=\",\"value\":\"FRONTEND\"}]},\"to\":\"now\"}],\"scope\":\"system\",\"show_meta\":false,\"signature\":\"ba9cb25dd0915a5b40587ac4d7e832516032efaa79e741567ca62b8eafbea720\",\"skip_check_series\":false,\"to\":\"now\",\"unit\":true}: ExecuteQuery: queryTSDB: query.executeQuery from victoria-metrics: metricQuery HandleRequest: convert influxQL to promQL: TranslateWithTimeRange: SELECT non_negative_derivative(mean(\"bin\")) *8 AS \"in_bps\", non_negative_derivative(mean(\"bout\")) *8 AS \"out_bps\" FROM \"haproxy\" WHERE (\"pxname\" = '37afabe2-96ec-432d-83d9-95ca3a3b318f' and \"svname\" = 'FRONTEND') AND time > now() - 1h GROUP BY time(1m) fill(none): Translate: translate field non_negative_derivative(mean(bin)) * 8 AS in_bps: getMetricName: field.Expr &influxql.BinaryExpr{Op:21, LHS:(*influxql.Call)(0xc001cfefc0), RHS:(*influxql.IntegerLiteral)(0xc001de7588)} is not supported",
  "time": "2024-01-27T13:36:22+08:00"
}

报错2

VictoriaMetrics invalid response
doQuery with input {"from":"now-1h","interval":"72s","metric_query":[{"from":"now-1h","model":{"database":"telegraf","group_by":[{"params":["$interval"],"type":"time"},{"params":["none"],"type":"fill"}],"interval":"72s","measurement":"haproxy","select":[[{"params":["/d(req|con)/"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"}],[{"params":["/hrsp_.+/"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"}],[{"params":["hrsp_1xx"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_1xx"],"type":"alias"}],[{"params":["hrsp_2xx"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_2xx"],"type":"alias"}],[{"params":["hrsp_3xx"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_3xx"],"type":"alias"}],[{"params":["hrsp_4xx"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_4xx"],"type":"alias"}],[{"params":["hrsp_5xx"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_5xx"],"type":"alias"}],[{"params":["hrsp_other"],"type":"field"},{"type":"mean"},{"type":"non_negative_derivative"},{"params":["hrsp_other"],"type":"alias"}]],"tags":[{"condition":"and","key":"pxname","operator":"=","value":"37afabe2-96ec-432d-83d9-95ca3a3b318f"},{"condition":"and","key":"svname","operator":"=","value":"FRONTEND"}]},"to":"now"}],"scope":"system","show_meta":false,"signature":"eaf6709dea9c10c617e622f136c3c281a73acf24c19d3ce185669acc3645ffdf","skip_check_series":false,"to":"now","unit":true}: ExecuteQuery: queryTSDB: query.executeQuery from victoria-metrics: metricQuery HandleRequest: query VM range by: union(label_set(haproxy_/d(req|con)/{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_/d(req|con)/"), label_set(haproxy_/hrsp_.+/{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_/hrsp_.+/"), label_set(haproxy_hrsp_1xx{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_1xx"), label_set(haproxy_hrsp_2xx{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_2xx"), label_set(haproxy_hrsp_3xx{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_3xx"), label_set(haproxy_hrsp_4xx{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_4xx"), label_set(haproxy_hrsp_5xx{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_5xx"), label_set(haproxy_hrsp_other{pxname="37afabe2-96ec-432d-83d9-95ca3a3b318f",svname="FRONTEND"}[1m], "__union_result__", "non_negative_derivative_mean_haproxy_hrsp_other")): status code: 422: VictoriaMetrics invalid response
{
  "class": "VictoriaMetrics invalid response",
  "code": 500,
  "details": "doQuery with input {\"from\":\"now-1h\",\"interval\":\"72s\",\"metric_query\":[{\"from\":\"now-1h\",\"model\":{\"database\":\"telegraf\",\"group_by\":[{\"params\":[\"$interval\"],\"type\":\"time\"},{\"params\":[\"none\"],\"type\":\"fill\"}],\"interval\":\"72s\",\"measurement\":\"haproxy\",\"select\":[[{\"params\":[\"/d(req|con)/\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"}],[{\"params\":[\"/hrsp_.+/\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"}],[{\"params\":[\"hrsp_1xx\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_1xx\"],\"type\":\"alias\"}],[{\"params\":[\"hrsp_2xx\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_2xx\"],\"type\":\"alias\"}],[{\"params\":[\"hrsp_3xx\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_3xx\"],\"type\":\"alias\"}],[{\"params\":[\"hrsp_4xx\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_4xx\"],\"type\":\"alias\"}],[{\"params\":[\"hrsp_5xx\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_5xx\"],\"type\":\"alias\"}],[{\"params\":[\"hrsp_other\"],\"type\":\"field\"},{\"type\":\"mean\"},{\"type\":\"non_negative_derivative\"},{\"params\":[\"hrsp_other\"],\"type\":\"alias\"}]],\"tags\":[{\"condition\":\"and\",\"key\":\"pxname\",\"operator\":\"=\",\"value\":\"37afabe2-96ec-432d-83d9-95ca3a3b318f\"},{\"condition\":\"and\",\"key\":\"svname\",\"operator\":\"=\",\"value\":\"FRONTEND\"}]},\"to\":\"now\"}],\"scope\":\"system\",\"show_meta\":false,\"signature\":\"eaf6709dea9c10c617e622f136c3c281a73acf24c19d3ce185669acc3645ffdf\",\"skip_check_series\":false,\"to\":\"now\",\"unit\":true}: ExecuteQuery: queryTSDB: query.executeQuery from victoria-metrics: metricQuery HandleRequest: query VM range by: union(label_set(haproxy_/d(req|con)/{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_/d(req|con)/\"), label_set(haproxy_/hrsp_.+/{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_/hrsp_.+/\"), label_set(haproxy_hrsp_1xx{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_1xx\"), label_set(haproxy_hrsp_2xx{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_2xx\"), label_set(haproxy_hrsp_3xx{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_3xx\"), label_set(haproxy_hrsp_4xx{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_4xx\"), label_set(haproxy_hrsp_5xx{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_5xx\"), label_set(haproxy_hrsp_other{pxname=\"37afabe2-96ec-432d-83d9-95ca3a3b318f\",svname=\"FRONTEND\"}[1m], \"__union_result__\", \"non_negative_derivative_mean_haproxy_hrsp_other\")): status code: 422: VictoriaMetrics invalid response",
  "time": "2024-01-27T13:37:30+08:00"
}
chenjacken commented 8 months ago

我新建了监听和后端服务组,访问:

[root@master1 ~]# curl -k https://172.16.1.182
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@master1 ~]# 

有什么思路排查问题吗?

swordqiu commented 8 months ago

我新建了监听和后端服务组,访问:

[root@master1 ~]# curl -k https://172.16.1.182
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@master1 ~]# 

有什么思路排查问题吗?

@chenjacken 从响应看,请求已经被lb应答,是lb探测不到后端服务器,可排查lbagent到后端服务器是否可达

swordqiu commented 8 months ago

victoriametrics报错是因为监控存储更换为vm后没有兼容导致的,我们排期处理

chenjacken commented 8 months ago

victoriametrics报错是因为监控存储更换为vm后没有兼容导致的,我们排期处理

后端服务的监控监测貌似会用到victoriametrics? 目前victoriametrics的报错,有临时解决方法吗?

chenjacken commented 8 months ago

我新建了监听和后端服务组,访问:

[root@master1 ~]# curl -k https://172.16.1.182
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@master1 ~]# 

有什么思路排查问题吗?

@chenjacken 从响应看,请求已经被lb应答,是lb探测不到后端服务器,可排查lbagent到后端服务器是否可达

负载均衡实例IP是172.16.1.182,新建了一个监听443端口,后端服务是本地网络:172.16.1.200,提供https服务:https://172.16.1.200

我登录负载均衡的节点服务器,curl -k https://172.16.1.200是可以正常访问到,curl -k https://172.16.1.182 就出现03 Service Unavailable

[root@lbagent3 ~]# curl -k https://172.16.1.182
^[[A<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@lbagent3 ~]# curl -k https://172.16.1.200
<!DOCTYPE html><html lang="en" translate="no"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1"><meta name="google" content="notranslate"><link rel="icon" href="./favicon.ico">
image image
chenjacken commented 8 months ago

另外,转发策略问题,新增时候提示找不到domain资源baidu.com,是还需要配置什么资源吗?

zexi commented 8 months ago

victoriametrics报错是因为监控存储更换为vm后没有兼容导致的,我们排期处理

后端服务的监控监测貌似会用到victoriametrics? 目前victoriametrics的报错,有临时解决方法吗?

@chenjacken 可以切换回 influxdb ,应该就没有这个报错了。

chenjacken commented 8 months ago

victoriametrics报错是因为监控存储更换为vm后没有兼容导致的,我们排期处理

后端服务的监控监测貌似会用到victoriametrics? 目前victoriametrics的报错,有临时解决方法吗?

@chenjacken 可以切换回 influxdb ,应该就没有这个报错了。

目前根据 https://www.cloudpods.org/docs/operations/monitoring/migrating-to-vm 切换到VM 有方法换回Influxdb吗? 谢谢

zexi commented 8 months ago

victoriametrics报错是因为监控存储更换为vm后没有兼容导致的,我们排期处理

后端服务的监控监测貌似会用到victoriametrics? 目前victoriametrics的报错,有临时解决方法吗?

@chenjacken 可以切换回 influxdb ,应该就没有这个报错了。

目前根据 https://www.cloudpods.org/docs/operations/monitoring/migrating-to-vm 切换到VM 有方法换回Influxdb吗? 谢谢

@chenjacken 按下面的步骤操作:

kubectl patch onecloudcluster -n onecloud default --type='json' -p='[{op: replace, path: /spec/influxdb/disable, value: false}]'
kubectl patch onecloudcluster -n onecloud default --type='json' -p='[{op: replace, path: /spec/victoriaMetrics/disable, value: true}]'

等待 influxdb 的 endpoint 创建出来

climc endpoint-list --service influxdb

然后重启相关服务,参考这个文档:https://www.cloudpods.org/docs/operations/monitoring/migrating-to-vm#%E9%87%8D%E5%90%AF%E5%B9%B3%E5%8F%B0%E6%9C%8D%E5%8A%A1 ,注意是endpont 不是 victoria-metrics 而是 influxdb 。

然后还需要把 lbagent 这个 daemonset 重启下。

chenjacken commented 8 months ago

谢谢,通过以上操作已经切换回来了influxdb,点击‘监控’没有报错。

一,访问不到服务问题

[root@master1 ~]# curl -k https://172.16.1.182
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@master1 ~]# 

image

https://github.com/yunionio/cloudpods/issues/19362#issuecomment-1913119331 有什么思路排查问题吗?

二,转发策略问题 转发策略问题,新增时候提示找不到domain资源baidu.com,是还需要配置什么资源吗?

谢谢 @zexi 目前还遇到以上2个问题,帮忙看看。

[root@master1 ~]# kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
lbagent1   Ready      <none>   22d   v1.15.12
lbagent2   Ready      <none>   22d   v1.15.12
lbagent3   Ready      <none>   22d   v1.15.12
master1    Ready      master   28d   v1.15.12
master2    Ready      master   28d   v1.15.12
master3    Ready      master   28d   v1.15.12

image

我看k8s是已经有3个lbagent节点,但是在web看到节点只有一个,是正常的吗?

chenjacken commented 8 months ago

image

我把负载均衡节点重新安装部署一遍,部署了2台,现在看到有2台了。

image image

如果后端是HTTPS,如何配置健康检查?

https://github.com/yunionio/cloudpods/issues/19362#issuecomment-1913948857 这两个问题还是不知道如何解决,麻烦指导下,谢谢!!❀🌹

chenjacken commented 8 months ago
[root@master1 ~]# curl -k https://172.16.1.182
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
[root@master1 ~]# 

以上的问题我通过配置2个内容恢复正常: 1,新建监听时配置设置PROXY协议为关闭 2,停用健康检查

image

另外,转发策略问题,新增时候提示找不到domain资源baidu.com,是还需要配置什么资源吗?还未搞明白这个。 @zexi 谢谢🌹

zexi commented 8 months ago

我看k8s是已经有3个lbagent节点,但是在web看到节点只有一个,是正常的吗?

@chenjacken 使用下面的命令看下这3个节点有没有 onecloud.yunion.io/lbagent=enable 的标签:

kubectl get nodes --show-labels

如果没有这个标签,把对应的 lbagent 节点打上标签,应该就可以了,然后等待 default-lbagent 这个 daemonset 的 pod 成功启动到对应节点,看下日志有没有报错。

chenjacken commented 8 months ago

我看k8s是已经有3个lbagent节点,但是在web看到节点只有一个,是正常的吗?

@chenjacken 使用下面的命令看下这3个节点有没有 onecloud.yunion.io/lbagent=enable 的标签:

kubectl get nodes --show-labels

如果没有这个标签,把对应的 lbagent 节点打上标签,应该就可以了,然后等待 default-lbagent 这个 daemonset 的 pod 成功启动到对应节点,看下日志有没有报错。

谢谢,这个问题已经解决了。我重新部署了2个LB的节点,都起来了,刚看了k8s的node标签都有对应打上。

麻烦帮忙看看这个https://github.com/yunionio/cloudpods/issues/19362#issuecomment-1918263026 和 https://github.com/yunionio/cloudpods/issues/19362#issuecomment-1914281146 🌹

chenjacken commented 8 months ago

image

转发策略,新增时候提示找不到domain资源baidu.com,是还需要配置什么资源吗? @swordqiu @zexi

chenjacken commented 6 months ago

image

转发策略,新增时候提示找不到domain资源baidu.com,是还需要配置什么资源吗? @swordqiu @zexi

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1,系统升级到v3.11.2,也是这样的提示,是我的使用方法不对吗? 2,另外,v3.11.0开始监控替换为 VictoriaMetrics,负载均衡会提示:

image

@zexi @swordqiu @zhasm 麻烦领导帮忙看看,指导指导,谢谢!!!

chenjacken commented 5 months ago

🌺

chenjacken commented 4 months ago

v3.11.3版本已修复