oliver006 / redis_exporter

Prometheus Exporter for ValKey & Redis Metrics. Supports ValKey and Redis 2.x, 3.x, 4.x, 5.x, 6.x, and 7.x
https://github.com/oliver006/redis_exporter
MIT License
3.1k stars 875 forks source link

Empty DB Change Issues? #144

Closed jcstanaway closed 6 years ago

jcstanaway commented 6 years ago

As mentioned in issue #124, in my environment I'd like to rename the Redis CONFIG command to "".

It appears that the recent PR #142 initializes the number of DBs to zero and sets it based on the result of obtaining the CONFIG:

    dbCount := 0

    if config, err := redis.Strings(c.Do("CONFIG", "GET", "*")); err == nil {
        dbCount, err = extractConfigMetrics(config, addr, e.redis.Aliases[idx], scrapes)
        if err != nil {
            log.Errorf("Redis CONFIG err: %s", err)
            return err
        }
    } else {
        log.Debugf("Redis CONFIG err: %s", err)
    }

The concern is here:

    for dbIndex := 0; dbIndex < dbCount; dbIndex++ {

        dbName := "db" + strconv.Itoa(dbIndex)

        if _, exists := handledDBs[dbName]; !exists {

            scrapes <- scrapeResult{Name: "db_keys", Addr: addr, Alias: alias, DB: dbName, Value: 0}
            scrapes <- scrapeResult{Name: "db_keys_expiring", Addr: addr, Alias: alias, DB: dbName, Value: 0}
        }
    }

If executing the CONFIG command fails, due to the rename, then dbCount is left at 0 and db_keys and db_keys_expiring are never initialized. That would appear to defeat the purpose of the change. Is that correct?

Second concern: extractInfoMetrics() is called twice. Once with the result of INFO and a second time with the result of CLUSTER INFO. Assuming CONFIG isn't renamed, the second time that extractInfoMetrics() is called, it again sets db_keys and db_keys_expiring to zero. That would appear to overwrite the actual values obtained the first time that extractInfoMetrics() is called with the INFO results. Is that also correct?

If either/both is an issue, I'm willing to submit a PR - just let me know.

Chris

oliver006 commented 6 years ago

Thanks for raising these two issues.

If executing the CONFIG command fails, due to the rename, then dbCount is left at 0 and db_keys and db_keys_expiring are never initialized. That would appear to defeat the purpose of the change. Is that correct?

yes, I think that's correct. If we can't determine the total number of DBs then we won't fill in the zeros for DBs that we don't get information back from INFO. However, we could default to 16 as that's the redis default (i think).

Second concern: extractInfoMetrics() is called twice

I think that is a real problem, let me look at the code again. If you have suggestions for a fix for that issue then let me know, help is definitely appreciated.

oliver006 commented 6 years ago

@ccs018 - I think you're right about the second issue. If a cluster is enabled then the second call to extractInfoMetrics() will not have anything in handledDBs and set all db key metrics to zero - :sadpanda:

I see two days how to fix this: 1) add a flag extractInfoMetrics() and indicate if db key metrics should be zero'd out 2) move the zero'ing of metrics out of extractInfoMetrics() so it's not called twice.

(1) is easier and faster to do than (2) but (2) is cleaner IMO - what do you think?

jcstanaway commented 6 years ago

For the first issue, a Redis Cluster only has the 1 DB, so assuming 16 would seem to be incorrect for such a deployment. I was thinking to default to 1 or provide a command line option to specify the number of DBs (I'm using clusters, so would prefer not to see metrics for non-existent DBs).

For the second issue, I think option 1 would be sufficient. But option 2 also looks like it would be just as easy by initially zero'ing out those metrics for each expected DB and could be done immediately before the first call to extractInfoMetrics().

    info, err := redis.String(c.Do("INFO", "ALL"))
    if err == nil {
        for dbIndex := 0; dbIndex < dbCount; dbIndex++ {
            dbName := "db" + strconv.Itoa(dbIndex)
            scrapes <- scrapeResult{Name: "db_keys", Addr: addr, Alias: alias, DB: dbName, Value: 0}
            scrapes <- scrapeResult{Name: "db_keys_expiring", Addr: addr, Alias: alias, DB: dbName, Value: 0}
        }
        e.extractInfoMetrics(info, addr, e.redis.Aliases[idx], scrapes, dbCount)
    } else {
        log.Errorf("Redis INFO err: %s", err)
        return err
    }
oliver006 commented 6 years ago

For the first issue, it's a bit confusing. For redis cluster there might just be one DB but for normal, stand-alone installations by default it allows up to 16 DBs (see documented config with default values here: https://raw.githubusercontent.com/antirez/redis/4.0/redis.conf ). In your cluster - what's the value of databases in CONFIG GET databases?

For the second issue, I remember there was an issue before with me always setting the value of "up" to 0 and then to 1 only when the scrape succeeded but that was causing trouble. I can't find the PR or issue right now but I'd rather not do the double writes.

I'll add a flag so we can get the issue fixed.

jcstanaway commented 6 years ago

Thanks for the quick turn around on this.

Per the redis documentation at https://redis.io/commands/select:

When using Redis Cluster, the SELECT command cannot be used, since Redis Cluster only supports database zero. In the case of Redis Cluster, having multiple databases would be useless, and a worthless source of complexity, because anyway commands operating atomically on a single database would not be possible with the Redis Cluster design and goals.

127.0.0.1:6400> info config
127.0.0.1:6400> config get databases
1) "databases"
2) "16"
127.0.0.1:6400> select 2
(error) ERR SELECT is not allowed in cluster mode

In my redis.conf, I didn't bother changing databases entry. I suppose I could set that to 1. But, currently, redis_exporter is getting the dbCount from CONFIG GET * and the issue arises when in renaming CONFIG to "" for security/control purposes.

An option is to default dbCount conditioned on cluster_enabled from INFO:

127.0.0.1:6400> info cluster
# Cluster
cluster_enabled:1

In extractInfoMetrics() before after checking if padDBKeyCounts (new line 541) check if dbCount is zero. If zero, then this means that it wasn't obtained from CONFIG GET, so if cluster_enabled is 1 (which was just retrieved), then default dbCount to 1 else default to 16.

oliver006 commented 6 years ago

Hmmmmm, this might work but would need a bit of restructuring around line 628-643. I first need to figure out if cluster_enabled is 1, then fix up dbCount, then run extractInfoMetrics(). Let me try that.