kubeflow / testing

Test infrastructure and tooling for Kubeflow.
Apache License 2.0
63 stars 89 forks source link

kfp: Increase memory limits for metadata writer to 1Gi #1015

Closed chensun closed 2 years ago

chensun commented 2 years ago

Which issue is resolved by this Pull Request: Resolves # Metadata-writer was in a crash-loop due to OOMKIlled.

Name:             metadata-writer-546cc4bbb4-fl2ww
Namespace:        kubeflow
Priority:         0
Service Account:  kubeflow-pipelines-metadata-writer
Node:             gke-kfp-standalone-1-kfp-standalone-1-60baa0a0-bg42/10.128.0.45
Start Time:       Thu, 01 Sep 2022 22:16:32 +0000
Labels:           app=metadata-writer
                  application-crd-id=kubeflow-pipelines
                  pod-template-hash=546cc4bbb4
Annotations:      kubectl.kubernetes.io/restartedAt: 2022-07-06T13:40:37-06:00
                  kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container main; cpu, memory limit for container main
Status:           Running
IP:               10.44.33.228
IPs:
  IP:           10.44.33.228
Controlled By:  ReplicaSet/metadata-writer-546cc4bbb4
Containers:
  main:
    Container ID:   docker://232cf30bed0894dc79e3ca04abafeb518430ae2541ec405ced3ccfeaa0c6a132
    Image:          gcr.io/ml-pipeline/metadata-writer:2.0.0-alpha.4
    Image ID:       docker-pullable://gcr.io/ml-pipeline/metadata-writer@sha256:2fe55a9df6d562188bd9fdf6f2de666767f24fd5f126361d699010b650f4f636
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 06 Sep 2022 16:38:00 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 06 Sep 2022 16:32:57 +0000
      Finished:     Tue, 06 Sep 2022 16:36:35 +0000
    Ready:          True
    Restart Count:  82
    Limits:
      cpu:     1
      memory:  512Mi

Description of your changes:

Checklist:

If PR related to Optional-Test-Infra,

google-oss-prow[bot] commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/testing/blob/master/OWNERS)~~ [chensun] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment