oracle / oracle-db-appdev-monitoring

Metrics exporter and samples for unified observability for data-centric app dev and microservices
http://developer.oracle.com/microservices
Other
67 stars 22 forks source link

metrics default error #32

Open CarlosFdez77 opened 11 months ago

CarlosFdez77 commented 11 months ago

I have problems with some checks against an autonomous database in the standard metrics. The exporter is deployed in Kubernetes.

I also have the same problem in custom checks such as slow queries.

caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 5.687861ms=: caller=collector.go:326 level=error Errorscrapingfor=waittime ="unsupported value type" 5.207271ms=:

markxnelson commented 11 months ago

Hi, thanks for reporting this issue!

Most of the time that message seems to occur when the user does not have permission to run the query, or when the query is only suitable for a CDB and is being run in a PDB, as in the case with ADB.

If you could provide me your custom metrics file, and confirm you are using the normal "out-of-the-box" standard metrics, I can try to debug this for you. Please also confirm what version your ADB instance is, and what user you are using to connect to it?

Thanks

CarlosFdez77 commented 11 months ago

Hi Mark, thanks for your time. I thought this export was functional with ADB, in our case we used version 19.

Regarding the default metrics, I have a problem with two, which causes a lot of noise in the pod log:

ts=2023-10-20T05:56:42.466Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 4.801915ms=: ts=2023-10-20T05:56:42.468Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.01722ms=: ts=2023-10-20T05:57:02.467Z caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.563066ms=: ts=2023-10-20T05:57:02.469Z caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 7.338398ms=:

First of all, let me tell you that I am not an expert in BBDD. Regarding permits; I have included what the documentation indicates for the default metrics:

V_$LOG V_$PROCESS V_$SESSION V_$SYSSTAT V_$INSTANCE V_$DATAFILE V_$SYSTEM_WAIT_CLASS V_$RESOURCE_LIMIT V_$WAITCLASSMETRIC V_$ASM_DISKGROUP_STAT DBA_FREE_SPACE DBA_DATA_FILES DBA_TABLESPACES DBA_TEMP_FILES V_$TEMP_EXTENT_POOL V_$TEMP_SPACE_HEADER DBA_TABLESPACE_USAGE_METRICS

I have no answer with the "wait_time" and "resource" metrics. In "wait_time" I see that the v$waitclassmetric view is empty, is this normal in ADB? SELECT n.wait_class as WAIT_CLASS, round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE FROM v$waitclassmetric m, v$system_wait_class n WHERE m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'

And in "resource" likewise the view v$resource_limit does not return data either.

SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit

Thank you very much for your interest, Carlos.

CarlosFdez77 commented 11 months ago

My deploy is simple:

` apiVersion: apps/v1 kind: Deployment metadata: name: oracle-metrics-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: oracle-metrics-exporter template: metadata: labels: app: oracle-metrics-exporter spec: containers:

markxnelson commented 11 months ago

thanks for the info. we are investigating this. some initial comments from my DBA colleague (fyi);

v$waitclassmetric is capturing real-time events within the past 1-minute; its _history companion view will hold that data for up to an hour. Being empty is a sign that the database is not being hit hard. I don't see anything that says that view will not populate in an ADB-S environment (they are container aware so there's nothing that would be exposed from other PDBs in the same shared CDB which would warrant disabling).

Given ADBs are on Exa it will probably take a lot to make an entry in the view ... so "no rows" output can be expected. However, I did generate a massive amount of I/O to try and trigger a wait event in that view and nothing came of it; checking with ADB team if this is expected

v$resource_limit : It looks like it can now only be queried from the CDB; so for ADB's this will always return 0 rows and will only return rows if queried from the CDB in non-ADBs.

markxnelson commented 11 months ago

i am going to look at handling this better and supressing the spourious messages

CarlosFdez77 commented 10 months ago

OK thanks. I understand there will be a new version of the image at some point. correct?

markxnelson commented 10 months ago

Yes indeed, we are just doing some testing and then will put out an update.

markxnelson commented 10 months ago

hi, i did put out a 1.1 release. it will still report when it cannot get a metric, but the output is more useful now, i hope. they look like this now... i was thinking maybe it should be a warning not an error, but its a bit difficult when the query works but no rows are returned to know if that is a good thing or a bad thing. if the query fails its clearly an error.

ts=2023-10-27T16:56:19.884Z caller=collector.go:327 level=error msg="Error scraping metric" Context=ownership MetricsDesc="map[inst_id:Owner instance of the current queues.]" time=1.502612ms error="no metrics found while parsing, query returned no rows"

i am looking into how to throttle these messages - suppress duplicates. i don't think the logging library i am using supports that, so i need to figure out if i should swap to a different one, or build something in.

i also changed the wait_class query so it should work fine in both pdbs and cdbs now - but it will still return no rows unless the db is under stress.