kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.99k stars 4.65k forks source link

Docker fails to start when using awslogs #4033

Closed jacobwoffenden closed 6 years ago

jacobwoffenden commented 6 years ago

Thanks for submitting an issue! Please fill in as much of the template below as you can.

------------- BUG REPORT TEMPLATE --------------------

  1. What kops version are you running? The command kops version, will display this information. Version 1.8.0
  2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.
    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T19:11:02Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  3. What cloud provider are you using? aws
  4. What commands did you run? What is the simplest way to reproduce this issue? Add the following to cluster spec via kops edit cluster ${NAME}:
    additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
          "Resource": ["*"]
        }
      ]
    master: |
      [
        {
          "Effect": "Allow",
          "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
          "Resource": ["*"]
        }
      ]
    docker:
    logDriver: awslogs
    logOpt:
      - awslogs-create-group=true
      - awslogs-region=eu-west-2
      - awslogs-group=production
  5. What happened after the commands executed? After applying a rolling update, the first master it recreates doesn't come back
  6. What did you expect to happen? Kops applies awslogs docker opts
  7. Please provide your cluster manifest. Execute kops get --name my.example.com -oyaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.
    apiVersion: kops/v1alpha2
    kind: Cluster
    metadata:
    creationTimestamp: 2017-12-10T20:21:01Z
    name: ${CLUSTER_NAME}
    spec:
    additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
          "Resource": ["*"]
        }
      ]
    node: |
      [
        {
          "Effect": "Allow",
          "Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
          "Resource": ["*"]
        }
      ]
    api:
    loadBalancer:
      type: Public
    authorization:
    rbac: {}
    channel: stable
    cloudLabels:
    environment: production
    cloudProvider: aws
    configBase: s3://kops-state-store/${CLUSTER_NAME}
    dnsZone: ${CLUSTER_NAME}
    docker:
    logDriver: awslogs
    logOpt:
    - awslogs-create-group=true
    - awslogs-region=eu-west-2
    - awslogs-group=production
    etcdClusters:
    - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-2a-1
      name: a-1
    - encryptedVolume: true
      instanceGroup: master-eu-west-2b-1
      name: b-1
    - encryptedVolume: true
      instanceGroup: master-eu-west-2a-2
      name: a-2
    name: main
    - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-2a-1
      name: a-1
    - encryptedVolume: true
      instanceGroup: master-eu-west-2b-1
      name: b-1
    - encryptedVolume: true
      instanceGroup: master-eu-west-2a-2
      name: a-2
    name: events
    iam:
    allowContainerRegistry: true
    legacy: false
    kubernetesApiAccess:
    - xx.xx.xx.xx/32
    kubernetesVersion: 1.8.4
    masterInternalName: api.internal.${CLUSTER_NAME}
    masterPublicName: api.production.${CLUSTER_NAME}
    networkCIDR: 10.1.0.0/16
    networking:
    kuberouter: {}
    nonMasqueradeCIDR: 100.64.0.0/10
    sshAccess:
    - xx.xx.xx.xx/32
    subnets:
    - cidr: 10.1.32.0/19
    name: eu-west-2a
    type: Public
    zone: eu-west-2a
    - cidr: 10.1.64.0/19
    name: eu-west-2b
    type: Public
    zone: eu-west-2b
    topology:
    dns:
      type: Public
    masters: public
    nodes: public
  8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here. doesn't do anything
  9. Anything else do we need to know?
    
    root@ip-10-1-80-216:/home/admin# systemctl status docker
    ● docker.service - Docker Application Container Engine
    Loaded: loaded (/lib/systemd/system/docker.service; enabled)
    Active: activating (auto-restart) (Result: exit-code) since Sun 2017-12-10 20:37:41 UTC; 1s ago
     Docs: https://docs.docker.com
    Process: 2309 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
    Process: 2305 ExecStartPre=/opt/kubernetes/helpers/docker-prestart (code=exited, status=0/SUCCESS)
    Main PID: 2309 (code=exited, status=1/FAILURE)

Dec 10 20:37:41 ip-10-1-80-216 systemd[1]: Failed to start Docker Application Container Engine. Dec 10 20:37:41 ip-10-1-80-216 systemd[1]: Unit docker.service entered failed state. root@ip-10-1-80-216:/home/admin# docker version Client: Version: 1.13.1 API version: 1.26 Go version: go1.7.5 Git commit: 092cba3 Built: Wed Feb 8 06:36:34 2017 OS/Arch: linux/amd64 error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.26/version: read unix @->/var/run/docker.sock: read: connection reset by peer root@ip-10-1-80-216:/home/admin# cat /etc/sysconfig/docker DOCKER_OPTS=--ip-masq=false --iptables=false --log-driver=awslogs --log-level=warn --log-opt=awslogs-create-group=true --log-opt=awslogs-group=production --log-opt=awslogs-region=eu-west-2 --storage-driver=overlay



------------- FEATURE REQUEST TEMPLATE --------------------

1. Describe IN DETAIL the feature/behavior/change you would like to see.

2. Feel free to provide a design supporting your feature request.
97turbotalon commented 6 years ago

I had the same issue and resolved it by removing awslogs-create-group=true and manually creating the cloudwatch group

jacobwoffenden commented 6 years ago

thanks @97turbotalon - that did the trick!