thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.1k stars 2.1k forks source link

Ruler: skips rule files based on their names without any error/warning log. #4232

Closed ffilippopoulos closed 3 years ago

ffilippopoulos commented 3 years ago

Thanos version used:

thanos image: quay.io/thanos/thanos:v0.17.2

What happened:

We run thanos-rule with the following flag:

--rule-file=/var/thanos/rules/*.yaml

to load rules from all the yaml files we put under the above location. We use kustomize configMapGenerator to create a configMap with the needed files:

configMapGenerator:
  - name: alerts
    files:
      - accounting.yaml=resources/alerts/accounting.yaml
      - billing.yaml=resources/alerts/billing.yaml
      - cbc.yaml=resources/alerts/cbc.yaml
      - partner-data.yaml=resources/alerts/partner-data.yaml
      - partner.yaml=resources/alerts/partner.yaml
      - payments.yaml=resources/alerts/payments.yaml
      - prometheus.yaml=resources/alerts/prometheus.yaml
      - unicom.yaml=resources/alerts/unicom.yaml

and mount it under the passed location:

        volumeMounts:
        - mountPath: /var/thanos/rules
          name: rules
      volumes:
      - configMap:
          defaultMode: 420
          name: alerts-845c2ttgdg
        name: rules

As a result, we can see all the rule files mounted in the pods:

$ kubectl --context=dev-merit --namespace=sys-mon exec -ti thanos-rule-56f565b58b-7lvnw -- ls /var/thanos/rules
accounting.yaml    partner-data.yaml  prometheus.yaml
billing.yaml       partner.yaml       unicom.yaml
cbc.yaml           payments.yaml

Moreover, thanos ruler is able to see all 8 files on startup:

level=info ts=2021-05-11T16:15:45.776401996Z caller=rule.go:836 component=rules msg="reload rule files" numFiles=8

Thanos only loads rules from 6/8 files. In particular, the following:

/var/thanos/rules/accounting.yaml
/var/thanos/rules/billing.yaml
/var/thanos/rules/cbc.yaml
/var/thanos/rules/payments.yaml
/var/thanos/rules/prometheus.yaml
/var/thanos/rules/unicom.yaml

The end result is that we are missing rules defined in 2 yaml files.

What you expected to happen:

Load all the rules, or give an error on why it failed to do so.

How to reproduce it (as minimally and precisely as possible):

Following our example should be sufficient here.

Full logs to relevant components:

No useful logs from thanos regarding this one. We were profiling the issue via trying different sets of our rule files.

Anything else we need to know:

A workaround is to prefix all mounted files with _, which seems to do the trick and allows the ruler to load all our rules fine.

In particular, we believe that this behaviour is a result of: https://github.com/thanos-io/thanos/blob/f4a5904b0c3be7565298cf257ad43ac1942021cc/pkg/rules/manager.go#L358 and the way file names are trimmed to be written into tmp dir.

Environment: Kube version: v1.21.0 OS: Flatcar Container Linux by Kinvolk 2765.2.3 (Oklo) kernel: 5.10.32-flatcar container runtime: containerd://1.4.4

KamalSinghKhanna commented 3 years ago

I would like to work on this issue suggest me way to proceed do we have to write Tmp instead of Trim in this line "newFn := filepath.Join(m.workDir, s.String(), strings.TrimLeft(fn, m.workDir))"

matej-g commented 3 years ago

@wiardvanrij I'd take this one if it's not being worked on

daiwei233 commented 3 years ago

@wiardvanrij I'd take this one if it's not being worked on

Solution

Hi, I'm trying to fix it out. And change this: https://github.com/thanos-io/thanos/blob/f4a5904b0c3be7565298cf257ad43ac1942021cc/pkg/rules/manager.go#L358 newFn := filepath.Join(m.workDir, s.String(), strings.TrimLeft(fn, m.workDir)) to newFn := filepath.Join(m.workDir, s.String(), filepath.Base(fn))

What do you think?

reappear

The bug reappear:

package main

import (
    "fmt"
    "strings"
)

func main() {
    // source code:
    // newFn := filepath.Join(m.workDir, s.String(), strings.TrimLeft(fn, m.workDir))
    // s.String() = "ABORT"
    fn := "/opt/programs/thanos-0.19.0.linux-amd64/rules/test.yml"             // fn
    fn1 := "/opt/programs/thanos-0.19.0.linux-amd64/rules/test1.yml"           // fn1
    workdir := "/opt/programs/thanos-0.19.0.linux-amd64/ruler_data/.tmp-rules" // m.workDir

    fmt.Println(strings.TrimLeft(fn, workdir))
    fmt.Println(strings.TrimLeft(fn1, workdir))
}

And they have the same output:

yml
yml
matej-g commented 3 years ago

Hey @daiwei233, please have a look at my proposed solution at https://github.com/thanos-io/thanos/pull/4468 and let me know your thoughts!

I think best way here is to not trim / alter the filename, since the trimming does not seem to have the desired behavior and leaving the full filename in does not seem to pose an issue (I'm wondering what was the original intention behind using TrimLeft). With using filepath.Base, I think there is a potential for conflict between filenames in different sub-directories, if they include files with identical names.

daiwei233 commented 3 years ago

Hey @daiwei233, please have a look at my proposed solution at #4468 and let me know your thoughts!

I think best way here is to not trim / alter the filename, since the trimming does not seem to have the desired behavior and leaving the full filename in does not seem to pose an issue (I'm wondering what was the original intention behind using TrimLeft). With using filepath.Base, I think there is a potential for conflict between filenames in different sub-directories, if they include files with identical names.

You are right, i think it will be work.https://github.com/thanos-io/thanos/pull/4468