kubeservice-stack / OpsCenter

Observability stack: include metrics, logging, tracing monitoring and alerting!
4 stars 2 forks source link

OpsCenter-deploy #4

Closed yogiseven closed 1 year ago

yogiseven commented 1 year ago

1,loki和promtail部署的时候,均报: stream error: stream ID XX ;INTERNAL_ERROR,loki pods可启动,promtail无法启动, [root@master1 loki]# kubectl apply -f loki-allinone.yaml serviceaccount/loki created secret/loki created role.rbac.authorization.k8s.io/loki created rolebinding.rbac.authorization.k8s.io/loki created service/loki-headless created service/loki-memberlist created service/loki created statefulset.apps/loki created error: error when retrieving current configuration of: Resource: "monitoring.coreos.com/v1, Resource=servicemonitors", GroupVersionKind: "monitoring.coreos.com/v1, Kind=ServiceMonitor" Name: "loki", Namespace: "default" Object: &{map["metadata":map["labels":map["release":"metrics" "app":"loki" "chart":"loki-2.16.0" "heritage":"Helm"] "name":"loki" "namespace":"default" "annotations":map["kubectl.kubernetes.io/last-applied-configuration":""]] "spec":map["endpoints":[map["port":"http-metrics"]] "jobLabel":"jobLabel" "namespaceSelector":map["matchNames":["monitoring"]] "selector":map["matchLabels":map["app":"loki" "release":"metrics" "variant":"headless"]]] "apiVersion":"monitoring.coreos.com/v1" "kind":"ServiceMonitor"]} from server for: "loki-allinone.yaml": Get https://100.73.60.71:6443/apis/monitoring.coreos.com/v1/namespaces/default/servicemonitors/loki: stream error: stream ID 125; INTERNAL_ERROR

日志报(failed to make journal target manager: creating journal reader: failed to open journal in directory \"/var/log/messages\": no such file or directory)/var/log/messages文件宿主机是存在的,是否是pods本身创建的时候的pods内部得路劲文件不存在?

2,tempo pods启动失败报: 【level=error ts=2023-01-03T08:16:33.517544071Z caller=main.go:109 msg="error running Tempo" err="failed to init module services error initialising module: store: failed to create store unexpected error from ListObjects on dongjiang-test-tempo: Get \"https://eos-zhengzhou-1-internal.cmecloud.cn/dongjiang-test-tempo/?delimiter=%2F&encoding-type=url&prefix=\": dial tcp: lookup eos-zhengzhou-1-internal.cmecloud.cn on 10.233.0.3:53: server misbehaving】 tempo-allinone.yaml里面使用的eos-zhengzhou-1-internal.cmecloud.cn,研发环境无法访问公网,这个需要如何替换掉? storage: trace: backend: s3 block: bloom_filter_false_positive: 0.05 encoding: zstd index_downsample_bytes: 1000 pool: max_workers: 1000 queue_depth: 100000 s3: access_key: xxxxxxx bucket: dongjiang-test-tempo endpoint: eos-zhengzhou-1-internal.cmecloud.cn forcepathstyle: false hedge_requests_at: 500ms insecure: false region: eos-zhengzhou-1 secret_key: xxxxxxx wal: encoding: snappy path: /var/tempo/wal

dongjiang1989 commented 1 year ago
  1. 需要添加 loki namespace 需要和 metrics的namespace在一起

    helm install metrics . --namespace monitoring
    helm install loki . --namespace monitoring
  2. 日志存储需要S3作为存储介质,如果环境中没有标准S3,可以手动搭建一个 minio https://hub.docker.com/r/minio/minio