Provide a workaround for GKE reserving 10% of disk space on local SSD nodes by default

scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB

https://operator.docs.scylladb.com/

Apache License 2.0

337 stars 175 forks source link

Provide a workaround for GKE reserving 10% of disk space on local SSD nodes by default #2056

Open gdubicki opened 3 months ago

gdubicki commented 3 months ago

What should the feature do?

The default kubelet hardEviction settings for nodefs.available is 10%.

For Scylla running on GKE nodes this applies to the local SSDs.

There is a feature request to enable changing this value in GKE but it is not implemented as of now: https://issuetracker.google.com/issues/185760232

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

What is the use case behind this feature?

Everyone running Scylla on GKE with local SSDs, to not waste 10% of their disk space

Anything else we need to know?

No response

scylla-operator-bot[bot] commented 2 months ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

scylla-operator-bot[bot] commented 1 month ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out

/lifecycle rotten

gdubicki commented 1 month ago

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

We did it ourselves but it would be great if Scylla Operator would do this, perhaps next to the node tuning / as a part of the node tuning.

If you would be interested in this, I could provide a PR with this feature.

gdubicki commented 1 month ago

/remove-lifecycle rotten /remove-lifecycle stale

ylebi commented 4 weeks ago

If you would be interested in this, I could provide a PR with this feature.

Hi, you are welcome to open a PR, please make sure you follow the Contributing Guide.

Thanks for contributing.

tnozicka commented 4 weeks ago

From what a I recall kubelet config in GKE was supported to be changed by editing a node pool which is out of reach for our automation, so more of a docs mention IMO when it gets there.

The workaround is to deploy your own DaemonSet that will update the kubelet settings on Scylla nodes.

This likely gets you into an unsupported territory and a bunch of initialization races. It may be better to wait for GKE to allow adjusting it, before recommending that to others.

We have also migrated our DaemonSets into a NodeConfig and we don't add new DaemonSets anymore, in favour of the API.

This also isn't ScyllaDB issue. I know we try to tune some stuff where we can't avoid it but each of them come with a burden and we have to balance between the benefits, stability, cross platform support and how much hacky it is.

gdubicki commented 4 weeks ago

I think you are right, @tnozicka. It's a bad idea to provide an unsupported solution for GKE in the Scylla Operator repo,

I would at least like the next person affected by this to learn about it the easier way though.

Would you accept a PR to https://operator.docs.scylladb.com/stable/gke.html to document this potential issue?

tnozicka commented 4 weeks ago

I think a docs mention is fitting and referencing the GKE issue is also helpful, so I'd welcome a :::{note} in our docs somewhere around where we set the kubelet config in GKE. Thanks @gdubicki