fernandozangari commented 2 years ago

Hi, I am working with zebra_dev and is running OK (no more parsing errors) so:

install zebra_dev, OK install Prometheus, OK; and connecting to Zebra OK install and running Grafana, OK; and sourcing from Prometheus OK DDS running OK and access to RMF Data Portal, OK Testing from zebra to DDS OK

but I don´t find any metric in Prometheus even though I see that it is scrapping

from zebra log I can see that it is been scrapped

GET /prommetric 200 0.494 ms - - GET /prommetric 200 0.692 ms - - GET /prommetric 200 1.309 ms - - GET /prommetric 200 0.633 ms - - GET /prommetric 200 0.678 ms - - GET /prommetric 200 2.909 ms - -

from Prometheus, it looks UP (Targets) and running. I have read the documentation n times but don´t see where the problem is, because no error messages.

Any idea of the problem, it seems to be between prometheus and zebra.

THANKS!

behives commented 2 years ago

Hi @fernandozangari - can you check your Prometheus config file? It should also set up to connect to Zebra URL:PORT. For example for our demo site:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
alerting:
  alertmanagers:
  - scheme: http
    timeout: 10s
    api_version: v1
    static_configs:
    - targets: []
scrape_configs:
- job_name: zebra_new
  honor_timestamps: true
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /prommetric
  scheme: https
  tls_config:
    cert_file: /zebra/src/sslcert/server.cert
    key_file: /zebra/src/sslcert/server.key
    insecure_skip_verify: true
  static_configs:
  - targets:
    - zebra.talktothemainframe.com:3190
- job_name: zebra_dvlp
  honor_timestamps: true
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /prommetric
  scheme: http
  static_configs:
  - targets:
    - 47.1.60.80:3090

fernandozangari commented 2 years ago

Hi @ykimvicom, I checked the Prometheus config file and it is OK, it seems like Zebra is not leaving the metric in /prommetric endpoint, below the config definition from prometheus url

global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s alerting: alertmanagers:

follow_redirects: true scheme: http timeout: 10s api_version: v2 static_configs:
- targets: [] scrape_configs:
- job_name: prometheus honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true static_configs:
targets:
- localhost:9090 labels: group: prometheus
- job_name: zebra honor_timestamps: true scrape_interval: 1m40s scrape_timeout: 10s metrics_path: /prommetric scheme: http follow_redirects: true static_configs:
targets:
- localhost:3090

Thanks in advance for your help.

fernandozangari commented 2 years ago

I recreate the same environment in linux (Red Hat) with the same result.

behives commented 2 years ago

I see it's using api v2 - I'm wondering if there is a change where v2 needs additional parms? @salisbuk7897 any thoughts?

behives commented 2 years ago

@fernandozangari also, did you configure Zebra as HTTP or HTTPS?

fernandozangari commented 2 years ago

http

salisbuk7897 commented 2 years ago

Hi @fernandozangari what does http://zebra_url:zebra_port/Prommetric returns?

fernandozangari commented 2 years ago

blank page, no errors. I restart prometheus with api_version 1 with the same result

salisbuk7897 commented 2 years ago

Okay. The configuration is most likely the cause of the problem. I will need to confirm some of your config parameters value.. What is the value for usePrometheus in your Zconfig.json file?

fernandozangari commented 2 years ago

Below the Zconfig.json content

{ "dds" : { "PC1B": { "ddshhttptype":"http", "ddsbaseurl":"10.218.160.31", "ddsbaseport":"8803", "ddsauth":"false", "ddsuser":"INSXI65", "ddspwd":"STS22STS", "rmf3filename":"rmfm3.xml", "rmfppfilename":"rmfpp.xml", "mvsResource":",PC1B,MVS_IMAGE", "PCI": 2951, "usePrometheus":"true", "useMongo":"false" }, "DD1B" : { "ddshhttptype":"http", "ddsbaseurl":"10.218.160.30", "ddsbaseport":"8803", "ddsauth":"false", "ddsuser":"INSXI65", "ddspwd":"STS22STS", "rmf3filename":"rmfm3.xml", "rmfppfilename":"rmfpp.xml", "mvsResource":",DD1B,MVS_IMAGE", "PCI": 2951, "usePrometheus":"true", "useMongo":"false" } }, "ppminutesInterval":"30", "rmf3interval":"100", "use_cert": "false", "zebra_httptype":"http", "appurl":"localhost", "appport":"3090", "mongourl":"localhost", "dbinterval":"100", "dbname":"Zebrav1111", "mongoport":"27017", "useDbAuth":"true", "dbUser":"myUserAdmin", "dbPassword":"salisu", "authSource":"admin", "grafanaurl":"localhost", "grafanaport":"3000", "grafanahttptype": "http", "apiml_http_type" : "https", "apiml_IP" : "localhost", "apiml_port" : "10010", "apiml_auth_type" : "bypass", "apiml_username" : "username", "apiml_password" : "password" }

ghost commented 2 years ago

Hi @fernandozangari , it is possible that the reason prometheus routine is being called yet no metrics are exposed is that you don't have a metrics.json file created and inside the src directory. This is a recent change to the zebra_dev branch that allows for more customized metrics. There is a metrics.template.json that shows an example configuration. Each metric follows the format:

{
  "lpar": "string", //the reporting LPAR (ex. RPRT on our demo system)
  "request": {
    "report": "string", // the RMF III report (ex. CPC)
    "resource": "string" // the resource to query (optional)
  },
  "identifiers": [
    {
      "key": "string", // the field to identify the entity (ex. CPCPPNAM for getting partition name)
      "value": "string" // the value of the field (ex. QCK2)
    }
  ],
  "field": "string", // the field that represents the metrics value (ex. CPCPPTOU for physical utilization
  "desc": "string" // description of the metric (optional)
}

behives commented 2 years ago

@salisbuk7897 thanks! @fernandozangari - do you see prometheus gets its own metrics okay? sometimes a small character in prommetric config make it invalid and can't get the values...I see the prometheus config you had posted got translated. If you can post original text file it may help too.

fernandozangari commented 2 years ago

@ykimvicom , prometheus metrics, OK, I have a dashboard in Grafana working to test. Below prometheus.yml

my global config

global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

scrape_timeout is set to the global default (10s).

Alertmanager configuration

alerting: alertmanagers:

scheme: http timeout: 10s api_version: v1 static_configs:
- targets: []

Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

- "first_rules.yml"

- "second_rules.yml"

A scrape configuration containing exactly one endpoint to scrape:

Here it's Prometheus itself.

scrape_configs:

The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

job_name: "prometheus" metrics_path: '/metrics' scrape_interval: 15s static_configs:
- targets: ['localhost:9090'] labels:
  group: 'prometheus'
job_name: zebra honor_timestamps: true scrape_interval: 100s scrape_timeout: 10s metrics_path: /prommetric scheme: http static_configs:
- targets:
  - localhost:3090
    labels:
    
    group: 'zebra'
metrics_path defaults to '/metrics'

scheme defaults to 'http'.

fernandozangari commented 2 years ago

@jsantervicom , I checked what you said, the metrics.json file was empty, so I filled it based on metrics.template.json and changed with our LPAR name, and restart Zebra, below the file content

{ "PC1B_QCK2_PTOU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "QCK2" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the QCK2 partition." }, "PC1B_TRNG_PTOU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "TRNG" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the TRNG partition." }, "PC1B_VIDVLP_PTOU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "VIDVLP" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the VIDVLP partition." }, "PC1B_VIRPT_PTOU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "VIRPT" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the VIRPT partition." }

but the result continues.

When I refreshed the blank page http://localhost:3090/prommetric, I saw a request to zebra but nothing appears.

ghost commented 2 years ago

@fernandozangari The names of the partitions (ex. QCK2, TRNG, VIRPT, etc.) in the template's identifier fields are unique to our demo's system. Try changing those to some partition names in your system and see if that works.

fernandozangari commented 2 years ago

@jsantervicom , I changed what you said and the file content is

{ "NATU_PC1B_PTOU": { "lpar": "PC1B", "request": { "report": "CPC", "resource": ",PC1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "PC1B" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the PC1B partition." }, "NATU_DD1B_PTOU": { "lpar": "DD1B", "request": { "report": "CPC", "resource": ",DD1B,MVS_IMAGE" }, "identifiers": [ { "key": "CPCPPNAM", "value": "DD1B" } ], "field": "CPCPPTOU", "desc": "Physical total utilization for the DD1B partition." } }

But still have the same result (no metrics, blank page http://localhost:3090/prommetric)

fernandozangari commented 2 years ago

@jsantervicom , now the scrapping is done according metrics.json file (DD1B is not active), but still no metrics

GET /prommetric 200 1.346 ms - - GET /v1/PC1B/rmf3/CPC?resource=,PC1B,MVS_IMAGE 200 9889.830 ms - 1280 GET /v1/DD1B/rmf3/CPC?resource=,DD1B,MVS_IMAGE 302 11233.757 ms - 45 GET /rmfm3/error?emsg=-4078 200 26.363 ms - 1768 GET /prommetric 200 1.055 ms - - GET /prommetric 200 0.621 ms - - GET /v1/PC1B/rmf3/CPC?resource=,PC1B,MVS_IMAGE 200 10239.341 ms - 1280 GET /v1/DD1B/rmf3/CPC?resource=,DD1B,MVS_IMAGE 302 8056.616 ms - 45 GET /rmfm3/error?emsg=-4078 200 46.594 ms - 1768

behives commented 2 years ago

@fernandozangari how long did you wait and see? just want to make sure you let it run long enough as it need a few minutes to start collecting and write....

fernandozangari commented 2 years ago

More than an hour.

salisbuk7897 commented 2 years ago

Hi @fernandozangari , sorry for the inconvenience. We are currently working on the metrics customization feature which seems to be the problem. Please try the code in salisu_dev branch. it should work without the metrics customization feature. Please let me know if that solves the problem. Thanks

fernandozangari commented 2 years ago

ok, thanks

salisbuk7897 commented 2 years ago

You are welcome @fernandozangari , I thinks all you need is to add your zconfig.json file to the config directory and you will be good to go

fernandozangari commented 2 years ago

@salisbuk7897 THANKS!!! It is working.

salisbuk7897 commented 2 years ago

Awesome @fernandozangari The ZEBRA team will always be happy to hear your feedback on how we can make ZEBRA better. In case you have not joined ZEBRA slack channel yet, you can use the link below to join. https://openmainframeproject.slack.com/archives/C01QWBJG3A4

Thanks and I hope you enjoy working with ZEBRA

behives commented 2 years ago

great! Thanks @salisbuk7897 @fernandozangari @jsantervicom . @salisbuk7897 please close this and open new issue for metrics on zebra_dev to keep track.

zowe / zebra

No prometheus metrics #44

my global config

scrape_timeout is set to the global default (10s).

Alertmanager configuration

Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

- "first_rules.yml"

- "second_rules.yml"

A scrape configuration containing exactly one endpoint to scrape:

Here it's Prometheus itself.

The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

labels:

group: 'zebra'

metrics_path defaults to '/metrics'

scheme defaults to 'http'.

zowe / zebra

No prometheus metrics #44

my global config

scrape_timeout is set to the global default (10s).

Alertmanager configuration

Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

- "first_rules.yml"

- "second_rules.yml"

A scrape configuration containing exactly one endpoint to scrape:

Here it's Prometheus itself.

The job name is added as a label job=<job_name> to any timeseries scraped from this config.

labels:

group: 'zebra'

metrics_path defaults to '/metrics'

scheme defaults to 'http'.

The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.