kruize / autotune

Autonomous Performance Tuning for Kubernetes!
Apache License 2.0
165 stars 54 forks source link

Bulk api error handling with new Json #1368

Closed msvinaykumar closed 1 week ago

msvinaykumar commented 1 week ago

Description

Following new format incorporated

{
  "status": "IN_PROGRESS",
  "total_experiments": 23,
  "processed_experiments": 22,
"experiments": [
    {
      "name": "prometheus-1|default|kube-system|coredns(deployment)|coredns",
      "notification": {},
      "recommendations": {
        "status": "unprocessed"
      }
    },
    {
      "name": "prometheus-1|default|kube-system|kindnet(deployment)|kindnet-cni",
      "notification": {},
      "recommendations": {
        "status": "processed"
      }
    },
    {
      "name": "prometheus-1|default|monitoring|kruize(deployment)|kruize",
      "notification": {},
      "recommendations": {
        "status": "processing"
      }
    },
    {
      "name": "prometheus-1|default|monitoring|kruize(deployment)|kruize",
      "recommendations": {
        "status": "failed",
        "notifications": {
          "400": {
            "type": "error",
            "message": "Not able to fetch metrics",
            "code": 400
          }
        }
      }
    },
    {
      "name": "prometheus-1|default|monitoring|kruize(deployment)|kruize",
      "recommendations": {
        "status": "failed",
        "notifications": {
          "400": {
            "type": "error",
            "message": "Not able to fetch metrics",
            "code": 400
          }
        }
      }
    }
  ] ,
    "job_id": "5798a2df-6c67-467b-a3c2-befe634a0e3a",
  "job_start_time": "2024-10-09T18:09:31.549Z",
  "job_end_time": null
}

Fixes # (issue)

Type of change

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

Test Configuration

Checklist :dart:

Additional information

Include any additional information such as links, test results, screenshots here

msvinaykumar commented 1 week ago

Other examples

{
  "status": "FAILED",
  "total_experiments": 0,
  "processed_experiments": 0,
  "notifications": {
    "503": {
      "type": "ERROR",
      "message": "HttpHostConnectException: Unable to connect to the data source. Please try again later. (receive series from Addr: 10.96.192.138:10901 LabelSets: {prometheus=\"monitoring/k8stage\", prometheus_replica=\"prometheus-k8stage-0\"},{prometheus=\"monitoring/k8stage\", prometheus_replica=\"prometheus-k8stage-1\"},{replica=\"thanos-ruler-0\", ruler_cluster=\"\"} MinTime: 1730222825216 MaxTime: 1731412800000: rpc error: code = Unknown desc = receive series from 01JBV2JN5SVN84D3HD5MVSGN3A: load chunks: get range reader: Please reduce your request rate)",
      "code": 503
    }
  },
  "job_id": "270fa4d9-2701-4ca0-b056-74229cc28498",
  "job_start_time": "2024-11-12T15:05:46.362Z",
  "job_end_time": "2024-11-12T15:06:05.301Z"
}