openshift / origin-metrics

79 stars 113 forks source link

can not get cpu usage from hawkular API #60

Closed priyanka5 closed 8 years ago

priyanka5 commented 8 years ago

Hi,

I need to get cpu usage for a pod using hawkular API provided. I have tried below but could not understand the ouput. How should I read the Cpu or memory usage for a pod.

Note: memory/usage does not return any data here, why ?

any help on this would be appreciated

curl -H "Authorization: Bearer oxG3b4nOlTFPZ6Z1wxPtf_xPoe7Av5js4imU4jcxxdw" -H "Hawkular-tenant: test" -X GET https://hawkular-host/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_name:myapp-1-2dku9&stacked=true&buckets=3&start=`date -d -10minutes +%s%3N` --insecure | python -m json.tool

output

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 599 0 599 0 0 5703 0 --:--:-- --:--:-- --:--:-- 5759 [ { "avg": 384177728.43243253, "empty": false, "end": 1452681612365, "max": 386093564.0, "median": 384191688.7601133, "min": 382108657.0, "percentile95th": 385785471.0322342, "samples": 1, "start": 1452681412331 }, { "avg": 377499125.75, "empty": false, "end": 1452681412331, "max": 381990129.0, "median": 376540915.0343114, "min": 373719674.0, "percentile95th": 381664582.3546182, "samples": 1, "start": 1452681212297 }, { "avg": 371449489.57500005, "empty": false, "end": 1452681212297, "max": 373608416.0, "median": 371337044.7875711, "min": 369222075.0, "percentile95th": 373280499.01336825, "samples": 1, "start": 1452681012263 } ]

mwringe commented 8 years ago

Note: memory/usage does not return any data here, why ?

memory/usage is a gauge type metric, so you need to change https://hawkular-host/hawkular/metrics/counters/data/... to something like https://hawkular-host/hawkular/metrics/gauges/data/...

Information about the metric types can be found here https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md

Note that what Heapster calls cumulative is called counters in Hawkular Metrics

I need to get cpu usage for a pod using hawkular API provided. I have tried below but could not understand the ouput

Yes, the output does seem strange at first. From https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md the cpu/usage metric contains the number of nanoseconds a core has been used by the container since it has started.

Please see this section in the docs which describe how to get more meaningful information about cpu/usage: https://github.com/openshift/origin-metrics/blob/master/docs/hawkular_metrics.adoc#calcuating-percentage-cpu-usage

priyanka5 commented 8 years ago

Hi mwringe, thanks a lot for response, but I am worried here about the output I get, which is not at all understandable :( , also had a look here https://github.com/openshift/origin-metrics/blob/master/docs/hawkular_metrics.adoc#calcuating-percentage-cpu-usage , this says calculating the percentage of a CPU core used using cpu/usage and uptime data, but output for these metrics also in the same format as below, how do I get data for these:

{ "avg": 371449489.57500005, "empty": false, "end": 1452681212297, "max": 373608416.0, "median": 371337044.7875711, "min": 369222075.0, "percentile95th": 373280499.01336825, "samples": 1, "start": 1452681012263 }

anyhelp on this would be appreciated. Also I would like to know "network/rx" metrics available for pods, Is there a way we can configure it??

thankyou so much again!

jimmidyson commented 8 years ago

Network metrics are not yet available but will be in a future release. Upstream work on heapster has been completed to enable this & just needs to be pulled in to origin-metrics' build AFAIK.

priyanka5 commented 8 years ago

@mwringe @jimmidyson : thanks , could you also please give some inputs on reading output also , so that I can use this in my customized environment.

mwringe commented 8 years ago

@Yashu5 The metric definitions are all defined here: https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md

If you could give us an example of what you are trying to do exactly, we may be able to help

priyanka5 commented 8 years ago

@mwringe : would be very helpful if you could explain the meaning of each parameter in output of below command:

curl -H "Authorization: Bearer QgZXOuguhuih-miSNl2sqCXObsWp37sBTc" -H "Hawkular-te nant: test" -X GET https://hawkulat-host/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_name:myapp-1-5sp23&stacke d=true&buckets=1&start=date -d -30minutes +%s%3N --insecure | python -m json.tool

{ "avg": 1713515.0840336129, "empty": false, "end": 1452858405396, "max": 2599983.0, "median": 1716395.196040513, "min": 827070.0, "percentile95th": 2513252.983708628, "samples": 1, "start": 1452856605303 }

I am trying to get the cpu utilization (percentage or millicore) and memory utilization for each container from when it has started , so that this can be correlated with graphs shown in openshift webconsole for memory and cpu both.

Thanks in advance!

mwringe commented 8 years ago

Can you please see the documentation here: https://github.com/openshift/origin-metrics/blob/master/docs/hawkular_metrics.adoc Specifically the section which exactly goes over this: https://github.com/openshift/origin-metrics/blob/master/docs/hawkular_metrics.adoc#calcuating-percentage-cpu-usage

This documentation will point you to the Hawkular Metrics documentation (http://www.hawkular.org/docs/rest/rest-metrics.html) which should explain what each value returned means.

This will show you what each of the metrics gathered in the OpenShift environment means: https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md

So from your example above:

cpu/usage: this is the number of nanoseconds of CPU core time has been used since the container has been started.

We are fetching the values recorded between 1452856605303 ('start', milliseconds UTC) and 1452858405396 ('end', milliseconds UTC), and we are requesting that all this information be returned as one single bucket of data (due to the 'buckets=1' query string parameter).

So between these time frames, the lowest recorded value for the number of nanoseconds of CPU time was 827070.0 nanoseconds ('min') and the highest recorded value was 2599983.0 nanoseconds ('max') with a median value of 1716395.196040513 nanoseconds.

Assuming there was no restarts of the container during this time, the CPU value should only increase. We know that between these time frames, there was 1772913 nanoseconds of CPU core usage (max - min).

Now if you were to do something similar with the uptime metric, you could get how long the container has been running between this timeframe (we can't just use the start and end time here since we don't know if the container has been running for this whole time or not). For argument sake, lets say it ends up being 30000 milliseconds (30 seconds)

So we know that the container has used 1772913 nanoseconds on a CPU core, and the container has been running for 30000 milliseconds, or 30000000000 nanoseconds.

1772913 nanoseconds of CPU Core time / 30000000000 nanoseconds time = 0.000059097 cores

or 0.059 millicores

[assuming I didn't screw up any of the basic math here]

For graphing anything, you don't want to use a single bucket, for each point on the graph you need you would want to gather that many buckets. And then do the calculations based on each bucket.

How the OpenShift console does the calculations are all handled here: https://github.com/openshift/origin/blob/master/assets/app/scripts/services/metrics.js

priyanka5 commented 8 years ago

@mwringe thanks a lot for detailed info , I think things are clear now:

just one more doubt, for uptime metric, to get the exact value we have to do max - min ??

what i have done is to get the uptime metrics for pod for last 30 mins using

curl -H "Authorization: Bearer nxrxxxx784QDhggBHwYsk" -H "Hawkular-tenant: test" -X GET https://host/hawkular/metrics/counters/data?tags=descriptor_name:uptime,pod_name:test-1-85uqp&stacked=true&buckets=1&start=`date -d -30minutes +%s%3N` --insecure | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 145 100 145 0 0 1524 0 --:--:-- --:--:-- --:--:-- 1542 [ { "avg": 629703.832, "empty": false, "end": 1453096682772, "max": 1250721.0, "median": 638089.1634012002, "min": 12267.0, "samples": 1, "start": 1453094882679 } ]

so uptime would be 1250721.0 - 12267.0 ,

is this right or am I doing some mistake here??

as said by you "Now if you were to do something similar with the uptime metric, you could get how long the container has been running between this timeframe (we can't just use the start and end time here since we don't know if the container has been running for this whole time or not). For argument sake, lets say it ends up being 30000 milliseconds (30 seconds)"

you mean ignore the start and end time ?? does uptime metric give the values for only container running time period, so (max -min) will give uptime for running containers , what if container has been restarted during this time??

Thanks again !

mwringe commented 8 years ago

Housekeeping to close older issues. If you think this issue is not resolved, please reopen it.

The docs for how to calculate cpu is available here https://github.com/openshift/origin-metrics/blob/master/docs/hawkular_metrics.adoc#calcuating-percentage-cpu-usage