User API autoscaler - Githubissues

thoth-station / thoth-application

Thoth-Station ArgoCD Applications

GNU General Public License v3.0

12 stars 22 forks source link

User API autoscaler #111

Open fridex opened 4 years ago

fridex commented 4 years ago

Is your feature request related to a problem? Please describe.

As reported originally in https://github.com/thoth-station/user-api/issues/739 we could configure autoscaler for user API.

Describe the solution you'd like

Configure autoscaler so that user API is automatically scaled up if too many requests are made at the same time. As a start, we can use CPU utilization and observe how the API works. Later, we can utilize metrics, such as the number of connections to scale the replicas up/down.

Describe alternatives you've considered

Have constantly scaled multiple user API replicas, but that does not scale automatically based on load.

Additional context

https://docs.openshift.com/container-platform/3.9/dev_guide/pod_autoscaling.html https://github.com/thoth-station/user-api/issues/739

sesheta commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 3 years ago

@sesheta: Closing this issue.

In response to [this](https://github.com/thoth-station/thoth-application/issues/111#issuecomment-880705042): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

sesheta commented 3 years ago

@fridex: This issue is currently awaiting triage. One of the @thoth-station/devsops will take care of the issue, and will accept the issue by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

sesheta commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

fridex commented 3 years ago

/triage accepted /remove-lifecycle rotten

sesheta commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta commented 3 years ago

@sesheta: Closing this issue.

In response to [this](https://github.com/thoth-station/thoth-application/issues/111#issuecomment-926674390): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

harshad16 commented 3 years ago

/lifecycle frozen

VannTen commented 2 years ago

from my experience, web server like load scaling on cpu usage is not super reliable. Scaling on requests rate is probably a better idea, and the metrics is probably already available from whatever python webserver implementation.

harshad16 commented 2 years ago

Acceptance criteria

[ ] Check for custom metrics adapter on the cluster.
[ ] We can use the HTTP metrics from user-api usage and plug that in horizontal auto scaler
[ ] create a panel in grafana dashboard for user-api instances.

Resource

story points: 3pt