scholzj / terraform-aws-kubernetes

Terraform module for Kubernetes setup on AWS
Apache License 2.0
202 stars 129 forks source link

Update cluster #11

Closed sarge closed 6 years ago

sarge commented 6 years ago

I mis-read the error message, on the last PR. I have just done another scaling operation and got this error message. I don't think this get raised everytime.

I think this is the correct fix.

I have also included a small fix to correct overwriting the current autoscaling group capacity.

E0309 00:42:01.372859       1 event.go:200] Server rejected event 
'&v1.Event{
TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, 
ObjectMeta:v1.ObjectMeta{
Name:"cluster-autoscaler-status.151a16635f94ea2f", 
GenerateName:"", Namespace:"kube-system", 
SelfLink:"", UID:"",
 ResourceVersion:"4760077", 
Generation:0, 
CreationTimestamp:v1.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, 
InvolvedObject:v1.ObjectReference{
Kind:"ConfigMap", 
Namespace:"kube-system", 
Name:"cluster-autoscaler-status", 
UID:"8b7ecae7-2322-11e8-a8f0-02a9a02d1274", APIVersion:"v1", ResourceVersion:"4773065", FieldPath:""}, 
Reason:"ScaledUpGroup", 
Message:"Scale-up: group staging-aws-kubernetes-nodes size set to 2", Source:v1.EventSource{Component:"cluster-autoscaler", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{sec:63656149240, nsec:0, loc:(*time.Location)(0x5618b60)}}, LastTimestamp:v1.Time{Time:time.Time{sec:63656152921, nsec:371421951, loc:(*time.Location)(0x5618b60)}}, Count:2, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}':
 'events "cluster-autoscaler-status.151a16635f94ea2f" is forbidden: 
User "system:serviceaccount:kube-system:cluster-autoscaler" cannot patch events in the namespace "kube-system"' (will not retry!)
scholzj commented 6 years ago

The lifecycle change in main.tf makes certainly sense.

I'm not completely sure about the events change. What have you done to trigger this error? I have never seen it. Anyway, my main confusion is right now that we have creation of events in the ClusterRole and patching of events in the Role. I think it would make sense to have them together - either in the ClusterRole if that is needed or in the Role.

sarge commented 6 years ago

Good point on adding the events to the cluster role.

Digging into the logging of events, the chain of dependencies goes

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scale_up.go#L356 Which uses a LogRecorder Which in turn is made from an EventRecorder https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/utils/kubernetes/factory.go#L31 The EventRecorder does some aggregation and correlation which under some situations can create or update an existing event. Which is why I don't tend to see it every time.
https://github.com/kubernetes/client-go/blob/master/tools/record/event.go#L130