kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.56k stars 1.08k forks source link

Automated storage scaling #5232

Open spron-in opened 11 months ago

spron-in commented 11 months ago

Proposal

Hello all. We at Data on Kubernetes community are discussing automated storage scaling. We thought KEDA might be a good tool to unify metrics processing and execute resources scaling. For example, detect that storage reached its capacity threshold, then go into Custom Resource and change the size of the storage. Similarly to whan KEDA does with replicas, but focused on vertical storage scaling.

We maintain public gdoc here.

Use-Case

Databases are gaining more and more traction, but there are no unified solutions for storage scaling. We would love to have one that would be also beneficial for the open source community.

Is this a feature you are interested in implementing yourself?

Yes

Anything else?

We are ready to work with KEDA maintainers and submit pull requests to implement this functionality.

tomkerkhove commented 11 months ago

I like this idea, and so may @jeffhollan, but we do not support vertical scaling today (VPA request: https://github.com/kedacore/keda/issues/1788)

If you have a custom CRD that implements /scale subresource, however, we can help you horizontally scale out though?

spron-in commented 11 months ago

Thank you @tomkerkhove . Yeah, we noticed that VPA is not supported, but we are not looking for VPA support as well. The problem is that horizontal scaling can't help with storage (unless of course the underlying data layer supports sharding or some other storage scaling out mechanism).

So I have a spec in my CR where I state the size of the PVCs and storage class:

    volumeSpec:
      persistentVolumeClaim:
        storageClassName: standard
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 3Gi

Ideally I would want KEDA to monitor the storage size and if it reaches certain threshold, KEDA just changes storage from 3 to 10Gi:

        resources:
          requests:
            storage: 10Gi

Operator does the rest.

tomkerkhove commented 11 months ago

But that is basically defining what VPA does to a sense though - scaling vertically, in this case by increasing storage.

I'm afraid that until VPA support is added (if), then we cannot help unfortunately.

@kedacore/keda-maintainers thoughts?

spron-in commented 11 months ago

Any additional thoughts here, @tomkerkhove ?

JorTurFer commented 11 months ago

I think that it's an interesting point to explore. Currently, we are scaling workloads, but it's true that some stateful workloads need to scale up/down the storage too, and I see this as new feature in KEDA (Once KEDA is integrated with VPA, we help to scale up/down databases, but their tmp tables (as example) won't scale proportionally).

The main problem that I see here, is that (AFAIK) k8s doesn't expose an API that we can consume for managing the scaling of the PVC and his is a huge problem because it means that KEDA needs to modify user provided manifests for scaling the storage and that's a red line I'd not cross. Currently, we don't modify at all user manifests, we use scale API to delegate the action to the cluster (that's why KEDA only works with resources that implement /scale).

In this case, we'd need to modify the manifests provided by users, which is risky and can potentially produce data lost if we do something wrong. I have also some doubts about if it will work for scaling down and how to deal with it. Can we reduce a PVC that is using more space than desired space? Does that request fail, or it breaks the data consistency?

Is there any API that allows storage resizing or a safe way to do it on demand?

spron-in commented 11 months ago

@JorTurFer thanks!

There is no such API as of now. PVCs itself have volume expansion, but it is all about changing the storage size: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims

As for scaling down - it is not possible right now due to various limitations, from block storage support to filesystem logic. PVC validation webhook will reject the requests that try to reduce the volume size.

For DoK community the main question right now is if we should create our own tool to perfrom such resizing or contribute to existing projects, like KEDA or pvc-autoresizer.

We can try to implement something on top of KEDA and potentially have it merged later on (if community sees the need). Please let me know your thoughts.

zroubalik commented 11 months ago

@spron-in thanks for the proposal, it is an interesting one and definitely something, that we should try to integrate in KEDA if we find the right way to do it. Since there are some limitations and concerns as mentioned by Jorge and Tom above, but we should try to aim hight, right? :) Maybe a new CustomResource (next to ScaledObject and ScaledJob) to deal with this kind of workloads?

JorTurFer commented 11 months ago

Yeah, we can explore the option of adding an extra CRD like ScaledStorage to manage this. I think that we can reuse majority of the already exiting infrastructure just adding a new controller for the new CRD and a few changes. Maybe just adding a constraint for not reducing the size never could be a good starting point, WDYT @kedacore/keda-core-contributors ?

zroubalik commented 11 months ago

Yeah, let's start with a solid design. We can either try to do just ScaledStorage or maybe something little more generic to scale similar workloads to storage(is there any?)?

spron-in commented 11 months ago

Hey @zroubalik ! How can we proceed here? Shall we have a short call to discuss and brainstorm? Are there any regular calls that KEDA hosts?

tomkerkhove commented 11 months ago

We have bi-weekly standups: https://keda.sh/community/#get-involved

I think for this one it might make sense to discuss it there (if you can) and then do a design document in Google Forms (that we can create under KEDA) and then iterate on that

zroubalik commented 11 months ago

Yeah, let's proceed like this. The next standup is right before holidays, so not sure whether we should move this to the next year.

spron-in commented 11 months ago

@tomkerkhove @zroubalik thank you. We will join next call :)

tomkerkhove commented 11 months ago

@zroubalik Let's keep it for next week so we can let it sink in over the holidays

maximveksler commented 10 months ago

Kindly also consider the effect of ability to scale storage classes and IOPS

tomkerkhove commented 10 months ago

If I recall correctly, @spron-in was planning on writing a design proposal in Google Docs; is my memory correct? If so, any update on this?

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

zroubalik commented 8 months ago

@spron-in do we have any updates on this?