iamseth / oracledb_exporter

Prometheus Oracle database exporter.
MIT License
489 stars 236 forks source link

Positive value being turned into negative metric #229

Closed richardcrist closed 2 years ago

richardcrist commented 2 years ago

In a TOML file I have the below SQL query to pull max age of an Oracle queue entry for a given Oracle queue.

select (SELECT extract(day from 24*60*nvl(max(sysdate - from_tz(enq_time,'GMT') at TIME ZONE 'US/Central'),numtodsinterval(0,'minute'))) FROM alps.qt_rp_genpicks_q11) as qmxtm_genpicks_q11 from dual; The above query returns either 0 or a positive number representing the age in minutes of the oldest message on a queue. On a slightly older version of oracledb_exporter this returns the metric correctly, either 0 or a positive number.

However, on the latest version of oracledb_exporter the query is running fine but the metric value returned is a strange negative behavior. If the SQL query value returned is 0 then the metric is 0. If the SQL query value returned is 1 then the metric value is -359. A SQL query value of 2 yields a metric value of -358 and so on.

Does anyone have any insight into this metric behavior?

richardcrist commented 2 years ago

More research. Not a go expert but took a look into code. Below is a code section ending in the debug display of the value.

// interface method to call ScrapeGenericValues using Metric struct values
func ScrapeMetric(db *sql.DB, ch chan<- prometheus.Metric, metricDefinition Metric) error {
    log.Debugln("Calling function ScrapeGenericValues()")
    return ScrapeGenericValues(db, ch, metricDefinition.Context, metricDefinition.Labels,
        metricDefinition.MetricsDesc, metricDefinition.MetricsType, metricDefinition.MetricsBuckets,
        metricDefinition.FieldToAppend, metricDefinition.IgnoreZeroResult,
        metricDefinition.Request)
}

// generic method for retrieving metrics.
func ScrapeGenericValues(db *sql.DB, ch chan<- prometheus.Metric, context string, labels []string,
    metricsDesc map[string]string, metricsType map[string]string, metricsBuckets map[string]map[string]string, fieldToAppend string, ignoreZeroResult bool, request string) error {
    metricsCount := 0
    genericParser := func(row map[string]string) error {
        // Construct labels value
        labelsValues := []string{}
        for _, label := range labels {
            labelsValues = append(labelsValues, row[label])
        }
        // Construct Prometheus values to sent back
        for metric, metricHelp := range metricsDesc {
            value, err := strconv.ParseFloat(strings.TrimSpace(row[metric]), 64)
            // If not a float, skip current metric
            if err != nil {
                log.Errorln("Unable to convert current value to float (metric=" + metric +
                    ",metricHelp=" + metricHelp + ",value=<" + row[metric] + ">)")
                continue
            }
            log.Debugln("Query result looks like: ", value)

Below is some example output of debug.

DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  -357                source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] Query result looks like:  0                   source="main.go:322"
DEBU[0061] ScrapeGenericValues() - metricsCount:  10     source="main.go:397"
DEBU[0061] Successfully scraped metric:  ohlprod map[qmxtm_genpicks_q11:Max time of queue entries on genpicks_q11 qmxtm_genpicks_q12:Max time of queue entries on genpicks_q12 qmxtm_genpicks_q13:Max time of queue entries on genpicks_q13 qmxtm_genpicks_q14:Max time of queue entries on genpicks_q14 qmxtm_genpicks_q15:Max time of queue entries on genpicks_q15 qmxtm_genpicks_q16:Max time of queue entries on genpicks_q16 qmxtm_genpicks_q17:Max time of queue entries on genpicks_q17 qmxtm_genpicks_q18:Max time of queue entries on genpicks_q18 qmxtm_genpicks_q19:Max time of queue entries on genpicks_q19 qmxtm_genpicks_q20:Max time of queue entries on genpicks_q20] 51.75887016s  source="main.go:269"

I know from experience with this that the value of -357 means that a positive value of 3 was returned by the SQL query and then interpreted as -357 for some reason.

Other info: The working oracledb_exporter is running in our data center with os release shown below.

[someuser@somemachine ~]$ cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="7.7"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.7"
PRETTY_NAME="Oracle Linux Server 7.7"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:7:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.7
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.7
[someuser@somemachine ~]$

The problem oracledb_exporter is running in Azure with os release shown below.

[someuser@somemachine oracledb_exporter]# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.4"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.4"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:4:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.4
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.4
[someuser@somemachine oracledb_exporter]#

Also, in my original post I referred to older version versus newer version of oracledb_exporter. However, I believe that it is the latest version of oracledb_exporter 0.3.2 on both my data center and my Azure machines.

Giving the above information as I try to see what the issue might be. Anyone with any suggestions to try or insight all assistance is appreciated. Thank you!

HansK-p commented 2 years ago

Without understanding if this is relevant, I see that both "US Central Time" and "GMT" is a part of the query. US Central Time seems to be GMT-6. Could it be that these 360 minutes that creates confusion is in reality the offset between GMT and US Central Time?

richardcrist commented 2 years ago

HansK-p,

Thank you as that is a good call. I will try to rework query without the time functions but still honoring the time difference. Same query for metrics works on old server but your observation fits the data seen. Thanks again! I will reply next week with update.

Richard Crist IT Architect | RunOps GEODIS | Contract Logistics | Americas

7101 Executive Center Drive, Suite 333 | Brentwood, TN 37027 | USA Tel: +1 615-750-9865 | Mob: +1 615-308-9356 | IP: 3010 @.*** | geodis.com


From: HansK-p @.> Sent: Saturday, March 12, 2022 8:56:44 AM To: iamseth/oracledb_exporter @.> Cc: Crist, Richard @.>; Author @.> Subject: Re: [iamseth/oracledb_exporter] Positive value being turned into negative metric (Issue #229)

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Without understanding if this is relevant, I see that both "US Central Time" and "GMT" is a part of the query. US Central Time seems to be GMT-6. Could it be that these 360 minutes that creates confusion is in reality the offset between GMT and US Central Time?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/iamseth/oracledb_exporter/issues/229*issuecomment-1065897163__;Iw!!PZGyUz0!L7stMt5Yuk7MjG69ZiaVaSG4YznXm3Fysr4iNlRRxB-ml1pTTAD7VdHkUKLcBcOrIg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AVHZW5R6GO2TTBNMTWMMQX3U7SWCZANCNFSM5P63O3UQ__;!!PZGyUz0!L7stMt5Yuk7MjG69ZiaVaSG4YznXm3Fysr4iNlRRxB-ml1pTTAD7VdHkUKJMGZ4NQw$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!PZGyUz0!L7stMt5Yuk7MjG69ZiaVaSG4YznXm3Fysr4iNlRRxB-ml1pTTAD7VdHkUKIusKiv2A$ or Androidhttps://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!PZGyUz0!L7stMt5Yuk7MjG69ZiaVaSG4YznXm3Fysr4iNlRRxB-ml1pTTAD7VdHkUKJiluqGMg$. You are receiving this because you authored the thread.Message ID: @.***>

richardcrist commented 2 years ago

@HansK-p that turned out to be the issue but I had to solve it with trial and error. On my local Oracle DB Exporter system in Azure the system time was GMT which is ok with me. I used SQLPlus (sqlplus) command line to query against data center database. Below query run in SQL Developer against Oracle database gave small positive number of minutes, let's say 10 minutes during my test run. `SELECT extract(day from 2460nvl(max(sysdate - from_tz(enq_time,'GMT') at TIME ZONE 'US/Central'),numtodsinterval(0,'minute'))) FROM alps.qt_rp_replenish_q26; Same query run in sqlplus on Azure machine gave the negative number, which after this weekend time change for Central time zone in U.S. is 300 minutes difference instead of 360. From Azure machine querying to database using sqlplus sysdate looked correct and returned in the database Central time zone. However, when the above query was run suddenly it gave the negative minutes. Through trial and error testing I found that if I use "systimestamp" instead of "sysdate" it works ok and returns correct number of small positive number of minutes on queue. SELECT extract(day from 2460*nvl(max(systimestamp - from_tz(enq_time,'GMT') at TIME ZONE 'US/Central'),numtodsinterval(0,'minute'))) FROM alps.qt_rp_replenish_q26;` Anyway, thank you very much for your insight. I am going to make final across the board changes to use "systimestamp" and then verify that across the board it works for all my queries. I will make final closing reply here once full change and testing is done. Thanks, again!

richardcrist commented 2 years ago

@HansK-p ok I did final across the board changes and all look good. Using systimestamp instead of sysdate for some reason makes the difference on the Azure server running Oracle DB Exporter. All good to go now. Thank you again for your help!