vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.68k stars 593 forks source link

Unable to see cpu util in the /metrics/v2/values #21234

Closed 107dipan closed 2 years ago

107dipan commented 2 years ago

Describe the bug Unable to see cpu util in the /metrics/v2/values. I am getting the cpu in metrics.values but I am not getting the cpu.util values.

Reproduce I am using the container:8080/metrics/v2/values api and checking in the nodes array of the returned json

Expected behavior Metrics api should return the cpu util value.

Screenshots In the json object metrics.values I am only getting the following values memory_virt, memory_rss, cpu.

Environment (please complete the following information):

Vespa version 7.543

jobergum commented 2 years ago

I'm not able to reproduce this.

With vespaengine/vespa:7.542.42 :

curl localhost:8080/metrics/v2/values/ -s |jq                 
{
  "nodes": [
    {
      "hostname": "vespa-container",
      "role": "hosts/vespa-container",
      "services": [
        {
          "name": "vespa.container",
          "timestamp": 1645035420,
          "status": {
            "code": "up",
            "description": "Data collected successfully"
          },
          "metrics": [
            {
              "values": {
                "memory_virt": 4368171008,
                "memory_rss": 2006052864,
                "cpu": 14.810705866038,
                "cpu_util": 1.8513382332547
              },
              "dimensions": {
                "serviceId": "container"
              }
            },
...
107dipan commented 2 years ago

Let me try with 7.542.42. Just want to confirm if I only need to deploy the application.zip with this vespa version or I will also need to restart the services after deploying.

jobergum commented 2 years ago

There is no difference between 7.452. and 7.543. on this. When upgrading software you need to restart the process to take effect, deploying configuration and schema via deploy does not do that for you.

107dipan commented 2 years ago

Can I run the verpa-stop-services && verspa-start-services commands to restart the services? Do I need to run this command in all of my vespa pods or just the config node?

jobergum commented 2 years ago

You need to install the software on each Vespa pod (node, or what you want to call it) and restart all of them, just upgrading the configuration nodes does not install the software on the pods.

jobergum commented 2 years ago

Can I run the vespa-stop-services && verspa-start-services commands to restart the services?

These commands only stop the processes running locally on the node, not cluster-wide. You typically want to do that orchestrated in a production environment.

107dipan commented 2 years ago

Does vespa give us any tools/ways for this type of orchestration?

jobergum commented 2 years ago

The current best practice for deployment and upgrade orchestration is https://cloud.vespa.ai/. See also https://docs.vespa.ai/en/operations/live-upgrade.html

107dipan commented 2 years ago

Thanks a lot!

jobergum commented 2 years ago

I'm resolving this, my best guess is that you have not upgraded the process, or restarted the process after the upgrade. cpu_util metric was added maybe a month or two ago.

107dipan commented 2 years ago

Yes, We need to restart all the processes. Thanks a lot for your help!