loggie-io / loggie

A lightweight, cloud-native data transfer agent and aggregator
https://loggie-io.github.io/docs-en/
Apache License 2.0
1.26k stars 167 forks source link

关于日志告警的问题 #650

Open KevinLiangX opened 1 year ago

KevinLiangX commented 1 year ago

版本 v1.4.0

部署了 loggie-aggregator sts


[root@k8s-1 loggie]# kubectl get pod -owide
NAME                            READY   STATUS    RESTARTS      AGE    IP             NODE    NOMINATED NODE   READINESS GATES
kibana-kibana-68994557c-5ct2l   1/1     Running   1 (35h ago)   4d     10.233.89.30   k8s-4   <none>           <none>
loggie-aggregator-0             1/1     Running   0             113s   10.233.89.50   k8s-4   <none>           <none>
loggie-aggregator-1             1/1     Running   0             108s   10.233.89.51   k8s-4   <none>           <none>

配置了 source ==这里就有个疑问,日志告警,使用的是clusterlogconfig 还是logconfig

[root@k8s-1 loggie]# kubectl get clusterlogconfig test -oyaml
apiVersion: loggie.io/v1beta1
kind: ClusterLogConfig
metadata:
  creationTimestamp: "2023-11-21T08:23:40Z"
  generation: 1
  name: test
  resourceVersion: "2044739"
  uid: e37d4ae2-2bc1-43f2-846a-a1aa4e39260f
spec:
  pipeline:
    sinkRef: alert-sink
    sources: |
      - type: elasticsearch
        name: elastic
        hosts: ["http://10.233.7.149:9200"] ==》 这里还有个问题 不能配置svc FQDN ,配置报异常
        indices: ["kg-logstash-log*"]
        size: 10 # data size per fetch
        interval: 30s # pull data frequency
        timeout: 5s # pull timeout
        query: | # elastic query phrases
          {
            "term": {
              "metadata.namespace.keyword": "kube-system"
            }
          }
  selector:
    cluster: aggregator-loggie
    type: cluster

interceptor 没有配置,因为想看下从es中获取的日志格式


[root@k8s-1 loggie]# kubectl get sink alert-sink -oyaml
apiVersion: loggie.io/v1beta1
kind: Sink
metadata:
  creationTimestamp: "2023-11-21T08:27:10Z"
  generation: 1
  name: alert-sink
  resourceVersion: "2046677"
  uid: c1fd2306-e026-4d1c-b528-e84e73a184de
spec:
  sink: |
    type: dev
    printEvents: true
    codec:
      type: json
      pretty: true

日志 image

还有一点就是删除interceptor或者sink ,或者两者配置有问题,sts pod直接异常? 有没有详细的配置日志告警的示例或者说明,文档里面写的太简单了。

KevinLiangX commented 1 year ago

有没有考虑 每次从es获取数据,都是全量还是 可以获取某个时间间隔的,例如配置了1个小时,每次获取只是这个1个小时数据 interval: 30s # pull data frequency 这是每个30秒拉一次数据吧 ,这样子, 数据不就重复了

KevinLiangX commented 1 year ago

文档在更新更新吧,noData模式是干啥的? ”noData模式必填,在一定时间内,没有日志会发出告警。“ 配置6个小时,意思是这6个小时,匹配了日志,也不发送任何告警, 这是啥场景?

qq1573464941 commented 6 months ago

采集到loki,用grafana告警把