zowe / zebra

ZEBRA is an open-source incubator project for Zowe. It is a data parsing framework that allows quick and easy access to z/OS performance metrics.
https://zebra.talktothemainframe.com
Eclipse Public License 2.0
21 stars 12 forks source link

[BUG] No prometheus recording #93

Closed fernandozangari closed 2 years ago

fernandozangari commented 2 years ago

Describe the bug The last main version, in two different installations (windows w/o containers and linux under containers) are not recording in Prometheus (but is recording successfully in MongoDB), below the Zconfig.json file used (IP numbers are changed intentionally)

{ "dds" : { "PC1B": { "ddshhttptype":"http", "ddsbaseurl":"99.999.999.99", "ddsbaseport":"8803", "ddsauth":"false", "ddsuser":"ANY", "ddspwd":"ANY", "rmf3filename":"rmfm3.xml", "rmfppfilename":"rmfpp.xml", "mvsResource":",PC1B,MVS_IMAGE", "PCI": 3543, "usePrometheus":"true", "useMongo":"true" }, "DD1B" : { "ddshhttptype":"http", "ddsbaseurl":"99.999.999.99", "ddsbaseport":"8803", "ddsauth":"false", "ddsuser":"ANY", "ddspwd":"ANY", "rmf3filename":"rmfm3.xml", "rmfppfilename":"rmfpp.xml", "mvsResource":",DD1B,MVS_IMAGE", "PCI": 3543, "usePrometheus":"true", "useMongo":"true" } }, "ppminutesInterval":"30", "rmf3interval":"100", "use_cert": "false", "zebra_httptype":"http", "appurl":"localhost", "appport":"3090", "mongourl":"localhost", "dbinterval":"100", "dbname":"Zebrav1111", "mongoport":"27017", "useDbAuth":"false", "dbUser":"myUserAdmin", "dbPassword":"salisu", "authSource":"admin", "grafanaurl":"localhost", "grafanaport":"3000", "grafanahttptype": "http", "apiml_http_type" : "https", "apiml_IP" : "localhost", "apiml_port" : "10010", "apiml_auth_type" : "bypass", "apiml_username" : "username", "apiml_password" : "password" }

from log the zebra it seems like prometheus is scrapping well

GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5170.077 ms - 8575 Workload Updated Successflly GET /prommetric 200 1.916 ms - - GET /v1/DD1B/rmf3/CPC 200 5192.697 ms - 7927 GET /v1/DD1B/rmf3/PROC 200 5198.601 ms - 8347 PROC Updated Successflly CPC Updated Successflly GET /v1/DD1B/rmf3/SYSINFO 200 5520.221 ms - 16353 GET /v1/DD1B/rmf3/USAGE 200 6156.523 ms - 54096 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5183.854 ms - 8572 Workload Updated Successflly GET /v1/PC1B/rmf3/CPC 200 49482.957 ms - 7930 CPC Updated Successflly GET /v1/PC1B/rmf3/PROC 200 50137.166 ms - 26708 GET /v1/PC1B/rmf3/SYSINFO 200 50133.833 ms - 19518 PROC Updated Successflly GET /v1/PC1B/rmf3/USAGE 200 51419.914 ms - 80300 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5517.269 ms - 8572 Workload Updated Successflly GET /prommetric 200 0.796 ms - - GET /v1/DD1B/rmf3/PROC 200 5190.204 ms - 7953 GET /v1/DD1B/rmf3/CPC 200 5208.616 ms - 7927 PROC Updated Successflly CPC Updated Successflly GET /v1/DD1B/rmf3/SYSINFO 200 5509.705 ms - 16354 GET /v1/DD1B/rmf3/USAGE 200 6463.056 ms - 54094 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5196.712 ms - 8572 Workload Updated Successflly GET /v1/PC1B/rmf3/CPC 200 22203.334 ms - 7930 CPC Updated Successflly GET /v1/PC1B/rmf3/SYSINFO 200 22494.462 ms - 19530 GET /v1/PC1B/rmf3/PROC 200 22839.636 ms - 25454 PROC Updated Successflly GET /v1/PC1B/rmf3/USAGE 200 23765.883 ms - 81246 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5212.415 ms - 8572 Workload Updated Successflly GET /prommetric 200 0.713 ms - - GET /v1/DD1B/rmf3/PROC 200 5224.576 ms - 6750 GET /v1/DD1B/rmf3/CPC 200 5239.796 ms - 7927 PROC Updated Successflly CPC Updated Successflly GET /v1/DD1B/rmf3/SYSINFO 200 5521.973 ms - 16320 GET /v1/DD1B/rmf3/USAGE 200 6518.183 ms - 54094 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5176.130 ms - 8570 Workload Updated Successflly GET /v1/PC1B/rmf3/CPC 200 11166.080 ms - 7930 CPC Updated Successflly GET /v1/PC1B/rmf3/SYSINFO 200 11460.175 ms - 19530 GET /v1/PC1B/rmf3/PROC 200 11793.293 ms - 27067 PROC Updated Successflly GET /v1/PC1B/rmf3/USAGE 200 13057.265 ms - 85021 USAGE Updated Successflly GET /v1/DD1B/rmf3/SYSSUM?resource=%22,,SYSPLEX%22 200 5155.525 ms - 8570 Workload Updated Successflly

from prometheus console targets and configurations looks ok and running, but no metric is insert.

To Reproduce Steps to reproduce the behavior:

Start ZEBRA and Prometheus to run together

Expected behavior RMF metrics recorded in TSDB prometheus

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

behives commented 2 years ago

thanks @fernandozangari

@jsanter27 is this related to https://github.com/zowe/zebra/pull/65 ?

jsanter27 commented 2 years ago

@behives looks like he isn't using https, so that issue wouldn't affect him.

The /prommetric route is being called, so that means Prometheus is working fine... but looks like ZEBRA is not making the additional requests to get the metric data to return to Prometheus. @fernandozangari can we take a look at your metrics.json file to make sure the format is right?

fernandozangari commented 2 years ago

@jsanter27 the file metrics,json is the default when I download de zip of code, the content is only

{}

I will try with the version of this site, because it is not empty.

Thanks

fernandozangari commented 2 years ago

Sorry, there si no metrics.json in the site, it is created by zebra the first time.

jsanter27 commented 2 years ago

@fernandozangari , if there are no metrics defined in the metrics.json file, then ZEBRA won't collect any data for Prometheus to scrape. Please see the custom metric documentation on how to define metrics for Prometheus.

fernandozangari commented 2 years ago

@justin, what I find

1) Metrics, there are no metrics.json nor metrics.template.json in the SRC folder 2) Metrics.json is created empty the first time zebra is started 3) I obtained of older versions a metrics.template.json example 4) It seems as the template don´t have the same default metrics than salisu_dev 5) It is possible to export salisu_dev metrics to a metrics.json that I can use in new version?

I am working in adapting the new version to the old default metrics, if 5) is possible I save this work

Thanks

jsanter27 commented 2 years ago

@fernandozangari can you tell me the metrics that you were using for salisu_dev? I'm unfamiliar with that branch. Once I know what you were working with, I can help convert it to the newer format.

fernandozangari commented 2 years ago

@jsanter27 adjunct an extract of reduced prommetric by LPAR/JOBs/channels to help to understand which metrics were in the older version; I don´t find another way to show this. Thanks prometheus_salisu_version-reduced.txt

fernandozangari commented 2 years ago

Hi @jsanter27, trying to align the metrics.json to the default metrics of old version of zebra I build the following metrics.json file

{ "PC1B_CPCPPNAM_MSU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "ALL" } ], "field": "CPCPAMSU", "desc": "Actual consumed MSUs" }, "PC1B_CHANNEL_CHACPIVC_CHACPTVC": { "lpar": "PC1B", "request": { "report": "CHANNEL", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CHACPIVC", "value": "ALL" }, { "key": "CHACPTVC", "value": "ALL" } ], "field": "CHACPUVC", "desc": "Channel - Part util %" }, "PC1B_JOB_JUSPJOB": { "lpar": "PC1B", "request": { "report": "USAGE", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "JUSPJOB", "value": "ALL" } ], "field": "JUSPCPUD", "desc": "CPU time for interval (in seconds)" } }

for the first metric there is no problem it is working as expected, generating a metric for each LPAR with the name

PC1B_lpar_MSU

but for the second and third metrics it is working in a different way.

For example for the channel metric with the rule of generation (2 identifiers)

PC1B_CHANNEL_CHACPIVC_CHACPTVC

I expect to have a metric (like old version)

PC1B_CHANNEL_7F_FC (where 7F is the chpid and FC is the type of channel)

but obtain

PC1B_22_CHACPIVC , where CHANNEL is replaced by the CHACPIVC and CHACPVTC directly not appears

The same with the third metric, it is

PC1B_JOB_JUSPJOB

I expect to have a metric

PC1B_JOB_GPMSERVE

but obtain

PC1B_GPMSERVE_JUSPJOB

I check the configuration a lot of times but can´t find a solution, can you give me some help please.

Thanks

fernandozangari commented 2 years ago

In file forma metrics_json.txt t

fernandozangari commented 2 years ago

Hi @jsanter27, update, I migrated all the metrics from the old version to metrcis.json file; continue with problems, it seems the problem is in the naming format rules. metrics_json.txt

fernandozangari commented 2 years ago

I solve part of the origin of this issue adapting names in dashboard.

Close the issue because the title doesn´t respond to the actual problem.