kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

[Bug] Turndown schedule deleted a few seconds after being deployed #100

Closed giulio-chillipharm closed 3 weeks ago

giulio-chillipharm commented 5 months ago

Kubecost Version

2.2.5

Kubernetes Version

1.29

Kubernetes Platform

EKS

Description

Whenever I deploy a new cluster turndown schedule, it gets deployed for a few seconds, but then gets deleted soon after.

Steps to reproduce

  1. kubectl create ns kubecost

  2. kubectl create secret generic cluster-controller-service-key -n kubecost --from-file=service-key.json

  3. helm install kubecost cost-analyzer \ --repo https://kubecost.github.io/cost-analyzer/ \ --namespace kubecost \ --set kubecostToken="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \ -f cost-analyzer/values-eks-cost-monitoring.yaml

  4. kubectl create ns turndown

  5. kubectl create secret generic cluster-turndown-service-key -n turndown --from-file=service-key.json

  6. kubectl apply -f https://github.com/kubecost/cluster-turndown/releases/latest/download/cluster-turndown-full.yaml

  7. kubectl apply -f turndown-schedule.yaml

service-key.json { "aws_access_key_id": "XXXXXXXXXXXXXXXXXXXX", "aws_secret_access_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXX" }

turndown-schedule.yaml `apiVersion: kubecost.com/v1alpha1 kind: TurndownSchedule metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubecost.com/v1alpha1","kind":"TurndownSchedule","metadata":{"annotations":{},"finalizers":["finalizer.kubecost.com"],"name":"turndown-schedule"},"spec":{"end":"2024-05-21T12:00:00Z","repeat":"daily","start":"2024-05-21T10:00:00Z"}} creationTimestamp: "2024-05-20T17:45:33Z" finalizers:

Expected behavior

The turndown schedule once deployed should trigger the cluster turndown according to the start and end time.

Impact

We cannot turndown clusters during idle times, which prevents us from saving on cloud costs.

Screenshots

No response

Logs

│ I0520 17:20:00.001996       1 turndownscheduler.go:440] -- Scale Up --                                                                                                                                                                                 │
│ I0520 17:20:00.002060       1 namedlogger.go:24] [Turndown] NodeGroups Require Loading. Loading now...                                                                                                                                                 │
│ I0520 17:20:00.565382       1 namedlogger.go:24] [Turndown] Resetting all NodeGroup sizes to pre-turndown capacity...                                                                                                                                  │
│ I0520 17:20:00.565408       1 namedlogger.go:48] [Error] Failed to locate tag: cluster.turndown.previous for NodePool: cluster-turndown                                                                                                                │
│ I0520 17:20:00.565413       1 namedlogger.go:48] [Error] Failed to locate tag: cluster.turndown.previous for NodePool: default_node_group-20240520072913524700000017                                                                                   │
│ I0520 17:20:00.565421       1 namedlogger.go:24] [Turndown] Resuming Jobs...                                                                                                                                                                           │
│ I0520 17:20:00.568298       1 namedlogger.go:48] [Error] Failed to run scaling job: scaleup - Error: the server could not find the requested resource                                                                                                  │
│ I0520 17:25:00.568834       1 turndownscheduler.go:447] -- Reset --                                                                                                                                                                                    │
│ I0520 17:44:23.634170       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:44:23.642416       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ I0520 17:44:23.642916       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"turndown-schedule", UID:"262e8084-9d87-4d65-aab3-4d7a4d8b823f", APIVersion:"kubecost.com/v1alpha1", ResourceVersion:"242922", FieldP │
│ ath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown                                                                                                                                                             │
│ I0520 17:44:39.122487       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:44:39.126979       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ I0520 17:44:39.127427       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"turndown-schedule", UID:"262e8084-9d87-4d65-aab3-4d7a4d8b823f", APIVersion:"kubecost.com/v1alpha1", ResourceVersion:"242922", FieldP │
│ ath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown                                                                                                                                                             │
│ W0520 17:44:39.148030       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ E0520 17:44:39.148277       1 schedulecontroller.go:190] TurndownSchedule 'turndown-schedule' in work queue no longer exists                                                                                                                           │
│ I0520 17:45:33.434062       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:45:33.442077       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ I0520 17:45:33.442438       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"turndown-schedule", UID:"b3668bda-4a96-4a13-8551-c2f97f04292b", APIVersion:"kubecost.com/v1alpha1", ResourceVersion:"243377", FieldP │
│ ath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown                                                                                                                                                             │
│ I0520 17:45:39.124213       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:45:39.129749       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ E0520 17:45:39.129833       1 schedulecontroller.go:170] error syncing 'turndown-schedule': Operation cannot be fulfilled on turndownschedules.kubecost.com "turndown-schedule": the object has been modified; please apply your changes to the latest │
│  version and try again, requeuing                                                                                                                                                                                                                      │
│ W0520 17:45:39.138254       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ E0520 17:45:39.138336       1 schedulecontroller.go:170] error syncing 'turndown-schedule': Operation cannot be fulfilled on turndownschedules.kubecost.com "turndown-schedule": StorageError: invalid object, Code: 4, Key: /registry/kubecost.com/tu │
│ rndownschedules/turndown-schedule, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: b3668bda-4a96-4a13-8551-c2f97f04292b, UID in object meta: , requeuing                                                             │
│ E0520 17:45:39.138353       1 schedulecontroller.go:190] TurndownSchedule 'turndown-schedule' in work queue no longer exists                                                                                                                           │
│ E0520 17:45:39.148690       1 schedulecontroller.go:190] TurndownSchedule 'turndown-schedule' in work queue no longer exists                                                                                                                           │
│ I0520 17:46:49.936461       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:46:49.942659       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ I0520 17:46:49.943066       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"turndown-schedule", UID:"ceea9153-7647-4497-8872-1d0931bc50dd", APIVersion:"kubecost.com/v1alpha1", ResourceVersion:"243862", FieldP │
│ ath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown                                                                                                                                                             │
│ I0520 17:47:09.125497       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:47:09.133314       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ I0520 17:47:09.133689       1 event.go:282] Event(v1.ObjectReference{Kind:"TurndownSchedule", Namespace:"", Name:"turndown-schedule", UID:"ceea9153-7647-4497-8872-1d0931bc50dd", APIVersion:"kubecost.com/v1alpha1", ResourceVersion:"243862", FieldP │
│ ath:""}): type: 'Normal' reason: 'ScheduleTurndownSuccess' Successfully scheduled turndown                                                                                                                                                             │
│ I0520 17:47:39.126128       1 namedlogger.go:48] [Error] Failed to scheduled turndown. Schedule already exists.                                                                                                                                        │
│ W0520 17:47:39.130166       1 warnings.go:70] unknown field "status"                                                                                                                                                                                   │
│ E0520 17:47:39.130228       1 schedulecontroller.go:170] error syncing 'turndown-schedule': Operation cannot be fulfilled on turndownschedules.kubecost.com "turndown-schedule": StorageError: invalid object, Code: 4, Key: /registry/kubecost.com/tu │
│ rndownschedules/turndown-schedule, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: ceea9153-7647-4497-8872-1d0931bc50dd, UID in object meta: , requeuing                                                             │
│ E0520 17:47:39.130245       1 schedulecontroller.go:190] TurndownSchedule 'turndown-schedule' in work queue no longer exists                                                                                                                           │
│ E0520 17:47:39.135538       1 schedulecontroller.go:190] TurndownSchedule 'turndown-schedule' in work queue no longer exists

Slack discussion

No response

Troubleshooting

dwbrown2 commented 5 months ago

@AjayTripathy @kwombach12 this looks potentially related to the issue you were just investigating...

kwombach12 commented 5 months ago

yes it does - @giulio-chillipharm we are actively looking into this issue now.

chipzoller commented 3 weeks ago

Hello, in an effort to consolidate our bug and feature request tracking, we are deprecating using GitHub to track tickets. If this issue is still outstanding and you have not done so already, please raise a request at https://support.kubecost.com/.