kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.99k stars 4.65k forks source link

Need to increase root partition size #5405

Closed saravana-code closed 5 years ago

saravana-code commented 6 years ago

Thanks for submitting an issue! Please fill in as much of the template below as you can.

------------- BUG REPORT TEMPLATE --------------------

  1. What kops version are you running? The command kops version, will display this information. Version 1.8.0 (git-5099bc5)

  2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:16:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

  3. What cloud provider are you using? AWS

  4. What commands did you run? What is the simplest way to reproduce this issue? kops create cluster

  5. What happened after the commands executed? I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)

  6. What did you expect to happen? I want to have 100+GB of root partition by kops rolling update or upgrade, without losing the data( I am okay for restart but I need the pods, configmaps,secrets.,etc to be retained)

  7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2 kind: Cluster metadata: creationTimestamp: 2018-06-12T07:38:11Z name: prod.example.com spec: api: loadBalancer: type: Public authorization: alwaysAllow: {} channel: stable cloudLabels: Environment: Prod Provisioner: kops Role: node Type: k8s cloudProvider: aws configBase: s3://k8s-example-clusters/prod.example.com dnsZone: example.com etcdClusters:


apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:12Z labels: kops.k8s.io/cluster: prod.example.com name: bastions spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: t2.micro maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: bastions role: Bastion subnets:


apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:11Z labels: kops.k8s.io/cluster: prod.example.com name: master-ap-south-1a spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: t2.small maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: master-ap-south-1a role: Master subnets:


apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:12Z labels: kops.k8s.io/cluster: prod.example.com name: nodes spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: m5.large maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: nodes role: Node subnets:

  1. Anything else do we need to know? We deployed two cluster with same script, one has 120G of / partition and another has only 8G

------------- FEATURE REQUEST TEMPLATE --------------------

  1. Describe IN DETAIL the feature/behavior/change you would like to see.

  2. Feel free to provide a design supporting your feature request.

thomaspeitz commented 6 years ago

The corresponding docs for increasement of a disk are here: https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#changing-the-root-volume-size-or-type

But the main problem with your small disk size is your instance type - machineType: m5.large has currently problems with disk size - This issue should be hepful for you -https://github.com/kubernetes/kops/issues/3991

I would recommend to you: Change instance type from m5.large to m4.large and do a rolling upgrade. Helpful docs - https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#change-the-instance-type-in-an-instance-group

stanvit commented 6 years ago

We managed to resize NVME root partition with this hook:

spec:
  hooks:
  - name: resize-nvme-rootfs
    roles:
    - Node
    manifest: |
      Type=oneshot
      ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
frank-bee commented 6 years ago

@tsupertramp I also have problems with resizing the root volume , but with t2 family

kops version: Version 1.10.0 (git-3b783df3b) (forked by spotinst.com) kubectl version:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

instance group manifest:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-08-27T15:25:17Z
  labels:
    kops.k8s.io/cluster: frank***s.com
  name: es740_nodes
spec:
  image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
  machineType: t2.xlarge,t2.2xlarge
  maxSize: 5
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: ***_nodes
  role: Node
  rootVolumeSize: 150
  subnets:
  - eu-central-1a
  - eu-central-1b
  - eu-central-1c
RedVortex commented 6 years ago
  1. What happened after the commands executed? I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)

We have this exact same issue. We're trying to use r5 instances for our nodes and we end up having a root device of only 8GB.

Also the device is called /dev/nve.... Not /dev/xdva.... with those instances now, unlike what they used to be with r4 instances. Even if the instance's EBS volume is configured to expose it as xvda in the Amazon Console.

We had to go back to r4 which fixed the issue for now until we can move on to r5 instances again (a bit cheaper price-wise than r4 but much better in RAM and CPU)

igorvpcleao commented 6 years ago

Same here.

prichrd commented 6 years ago

We managed to resize NVME root partition with this hook:

spec:
  hooks:
  - name: resize-nvme-rootfs
    roles:
    - Node
    manifest: |
      Type=oneshot
      ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

@stanvit What AMI and Instance type are you using? I am trying to launch a r5.4xlarge with AMI k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11 (ami-dbd611a6) and I can't seem to be able to run the growpart command successfully. Here is the output I get:

FAILED: failed to get CHS from /dev/nvme0n1p1
root@ip-10-20-234-228:/home/admin# growpart /dev/nvme0n1 1
attempt to resize /dev/nvme0n1 failed. sfdisk output below:
|
| Disk /dev/nvme0n1: 16709 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|
|    Device Boot Start     End   #cyls    #blocks   Id  System
| /dev/nvme0n1p1   *      0+   1044-   1045-   8386560   83  Linux
| /dev/nvme0n1p2          0       -       0          0    0  Empty
| /dev/nvme0n1p3          0       -       0          0    0  Empty
| /dev/nvme0n1p4          0       -       0          0    0  Empty
| New situation:
| Units: sectors of 512 bytes, counting from 0
|
|    Device Boot    Start       End   #sectors  Id  System
| /dev/nvme0n1p1   *      4096 268430084  268425989  83  Linux
| /dev/nvme0n1p2             0         -          0   0  Empty
| /dev/nvme0n1p3             0         -          0   0  Empty
| /dev/nvme0n1p4             0         -          0   0  Empty
| Successfully wrote the new partition table
|
| sfdisk: BLKRRPART: Device or resource busy
| sfdisk: The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| sfdisk: If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
| Re-reading the partition table ...
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
sfdisk: BLKRRPART: Device or resource busy
sfdisk: The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****

If I run these commands: /bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

I get: NOCHANGE: partition 1 is size 268425989. it cannot be grown

stanvit commented 6 years ago

@pric, we are basing our AMIs the same base image, k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11, but encrypt them with KMS (that should affect the operations in any way though). Our instance type is m5.large.

growpart never worked for us either, failing with the same error as you just posted. The output from your growpart-workaround command invocation suggests that the the partition had been resized earlier. What does your fdisk -l show? Did you try to run resize2fs /dev/nvme0n1p1?

prichrd commented 6 years ago

@stanvit thanks for the help. Actually I fell back on the stretch AMI and everything is properly sized now. According to Geojaz on this issue (https://github.com/kubernetes/kops/issues/3901), stretch is now safe to use.

nerdinand commented 6 years ago

I resized my nodes from t2.large to c5.2xlarge and had the same issue. @stanvit's solution worked perfectly. Thanks so much!

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

3minus1 commented 5 years ago

Experienced the same issue with machineType t3.large and kops v1.8 The workaround provided by @stanvit worked for me.

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/5405#issuecomment-480257988): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.