Closed saravana-code closed 5 years ago
The corresponding docs for increasement of a disk are here: https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#changing-the-root-volume-size-or-type
But the main problem with your small disk size is your instance type - machineType: m5.large
has currently problems with disk size - This issue should be hepful for you -https://github.com/kubernetes/kops/issues/3991
I would recommend to you: Change instance type from m5.large to m4.large and do a rolling upgrade. Helpful docs - https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#change-the-instance-type-in-an-instance-group
We managed to resize NVME root partition with this hook:
spec:
hooks:
- name: resize-nvme-rootfs
roles:
- Node
manifest: |
Type=oneshot
ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
@tsupertramp I also have problems with resizing the root volume , but with t2 family
kops version: Version 1.10.0 (git-3b783df3b) (forked by spotinst.com)
kubectl version:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
instance group manifest:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-08-27T15:25:17Z
labels:
kops.k8s.io/cluster: frank***s.com
name: es740_nodes
spec:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.xlarge,t2.2xlarge
maxSize: 5
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: ***_nodes
role: Node
rootVolumeSize: 150
subnets:
- eu-central-1a
- eu-central-1b
- eu-central-1c
- What happened after the commands executed? I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)
We have this exact same issue. We're trying to use r5 instances for our nodes and we end up having a root device of only 8GB.
Also the device is called /dev/nve.... Not /dev/xdva.... with those instances now, unlike what they used to be with r4 instances. Even if the instance's EBS volume is configured to expose it as xvda in the Amazon Console.
We had to go back to r4 which fixed the issue for now until we can move on to r5 instances again (a bit cheaper price-wise than r4 but much better in RAM and CPU)
Same here.
We managed to resize NVME root partition with this hook:
spec: hooks: - name: resize-nvme-rootfs roles: - Node manifest: | Type=oneshot ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
@stanvit What AMI and Instance type are you using? I am trying to launch a r5.4xlarge
with AMI k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11 (ami-dbd611a6)
and I can't seem to be able to run the growpart command successfully. Here is the output I get:
FAILED: failed to get CHS from /dev/nvme0n1p1
root@ip-10-20-234-228:/home/admin# growpart /dev/nvme0n1 1
attempt to resize /dev/nvme0n1 failed. sfdisk output below:
|
| Disk /dev/nvme0n1: 16709 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|
| Device Boot Start End #cyls #blocks Id System
| /dev/nvme0n1p1 * 0+ 1044- 1045- 8386560 83 Linux
| /dev/nvme0n1p2 0 - 0 0 0 Empty
| /dev/nvme0n1p3 0 - 0 0 0 Empty
| /dev/nvme0n1p4 0 - 0 0 0 Empty
| New situation:
| Units: sectors of 512 bytes, counting from 0
|
| Device Boot Start End #sectors Id System
| /dev/nvme0n1p1 * 4096 268430084 268425989 83 Linux
| /dev/nvme0n1p2 0 - 0 0 Empty
| /dev/nvme0n1p3 0 - 0 0 Empty
| /dev/nvme0n1p4 0 - 0 0 Empty
| Successfully wrote the new partition table
|
| sfdisk: BLKRRPART: Device or resource busy
| sfdisk: The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| sfdisk: If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
| Re-reading the partition table ...
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
sfdisk: BLKRRPART: Device or resource busy
sfdisk: The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****
If I run these commands:
/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
I get:
NOCHANGE: partition 1 is size 268425989. it cannot be grown
@pric, we are basing our AMIs the same base image, k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
, but encrypt them with KMS (that should affect the operations in any way though). Our instance type is m5.large
.
growpart
never worked for us either, failing with the same error as you just posted. The output from your growpart-workaround
command invocation suggests that the the partition had been resized earlier. What does your fdisk -l
show? Did you try to run resize2fs /dev/nvme0n1p1
?
@stanvit thanks for the help. Actually I fell back on the stretch AMI and everything is properly sized now. According to Geojaz on this issue (https://github.com/kubernetes/kops/issues/3901), stretch is now safe to use.
I resized my nodes from t2.large
to c5.2xlarge
and had the same issue. @stanvit's solution worked perfectly. Thanks so much!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Experienced the same issue with machineType t3.large
and kops v1.8
The workaround provided by @stanvit worked for me.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Thanks for submitting an issue! Please fill in as much of the template below as you can.
------------- BUG REPORT TEMPLATE --------------------
What
kops
version are you running? The commandkops version
, will display this information. Version 1.8.0 (git-5099bc5)What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:16:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}What cloud provider are you using? AWS
What commands did you run? What is the simplest way to reproduce this issue? kops create cluster
What happened after the commands executed? I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)
What did you expect to happen? I want to have 100+GB of root partition by kops rolling update or upgrade, without losing the data( I am okay for restart but I need the pods, configmaps,secrets.,etc to be retained)
Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.apiVersion: kops/v1alpha2 kind: Cluster metadata: creationTimestamp: 2018-06-12T07:38:11Z name: prod.example.com spec: api: loadBalancer: type: Public authorization: alwaysAllow: {} channel: stable cloudLabels: Environment: Prod Provisioner: kops Role: node Type: k8s cloudProvider: aws configBase: s3://k8s-example-clusters/prod.example.com dnsZone: example.com etcdClusters:
apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:12Z labels: kops.k8s.io/cluster: prod.example.com name: bastions spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: t2.micro maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: bastions role: Bastion subnets:
apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:11Z labels: kops.k8s.io/cluster: prod.example.com name: master-ap-south-1a spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: t2.small maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: master-ap-south-1a role: Master subnets:
apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: 2018-06-12T07:38:12Z labels: kops.k8s.io/cluster: prod.example.com name: nodes spec: image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11 machineType: m5.large maxSize: 1 minSize: 1 nodeLabels: kops.k8s.io/instancegroup: nodes role: Node subnets:
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.------------- FEATURE REQUEST TEMPLATE --------------------
Describe IN DETAIL the feature/behavior/change you would like to see.
Feel free to provide a design supporting your feature request.