openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Error in OpenShift Online deployment not reported to user in OpenShift.io if quota exceeded #2629

Closed ldimaggi closed 5 years ago

ldimaggi commented 6 years ago

Steps to recreate:

screenshot 20

screenshot 19

Can we add some notification to the user at the point in time when the attempt to deploy to stage or run fails? The error is available in the events for the endpoint's pod in OSO:

screenshot 21 1

joshuawilson commented 6 years ago

This is due to the first 2 apps using up all the quota and not being shut down. We might need to tell them that there quota is full and they need to turn off ones they are not using.

This gets to be very bad if they "remove" the app from OSIO but the children are not removed from OSO. They continue to run but now the user can't see them in OSIO.

joshuawilson commented 6 years ago

Workaround is to reset env on profile page.

joshuawilson commented 6 years ago

@catrobson once this is confirmed by SD team, we will need some direction in what and how to tell the user.

catrobson commented 6 years ago

@joshuawilson we have a design to warn the user and not allow them to add more deployments (applications) here: https://redhat.invisionapp.com/share/PMDCE3G94

This was not carried into the launch wizard design since the functionality wasn't prioritized for implementation, but we could easily do the same thing in the new visual design.

catrobson commented 6 years ago

I think this also relates to deleting a space, since we leave stuff in OSO but the user might not realize that and then question why they are hitting limits.

maxandersen commented 6 years ago

Is there a way in the launcher flow to check for resources available and warn user if its getting "crowded" ?

catrobson commented 6 years ago

@maxandersen We had a solution (linked above) that we had removed this from the design process for launcher since there was no backend support for this capability. If we can support it, we can add that element back into the design for launcher. Please let me know and I'll get designs updated accordingly.

joshuawilson commented 6 years ago

I think we need to prioritze this. If it weren't for the Env Reset this might be a SEV1.

catrobson commented 6 years ago

After speaking with @joshuawilson we decided this was bigger than just the launcher - this is about informing users throughout the system about quota limitations. I have created the following UXD story to capture the work needed around this: https://github.com/fabric8-ui/fabric8-ux/issues/932

joshuawilson commented 6 years ago

The Deployment API could or should provide quota information to inform any component that needs to know if the quota has been maxed out.

jiekang commented 6 years ago

The Deployments API can be queried for quotas of:

per environment (run, stage) per deployment (instances of an application in stage/run)

andrewazores commented 6 years ago

^ for ease of reference:

https://github.com/fabric8-ui/fabric8-ui/blob/af70017347c0da12f185a9920bf047e90e57e32b/src/app/space/create/deployments/services/deployments.service.ts#L384

This links to the function that returns the used vs. available (quota) memory for an environment. The other relevant functions are adjacent, within that same class.

Worth mentioning that this service was designed with a poll cycle and a push model for subscribers, so Observables returned by those functions do not necessarily issue an immediate HTTP request. If this is going to be reused for other parts of the UI then we may need to make some modifications or expose another function to get access to the same data with an on-demand pulling sort of model, depending on what kind of new UX is going to consume this outside of the Deployments page.

catrobson commented 6 years ago

From March 19 platform meeting, we agreed that:

ebaron commented 6 years ago

If you want to use the backend API directly to get environment quota/usage, here's an example:

$ curl -H "Authorization: Bearer $MY_TOKEN" \
https://api.openshift.io/api/deployments/environments/run
{"data":{"attributes":{"name":"run","quota":{"cpucores":{"quota":2,"used":0.488},
"memory":{"quota":1073741824,"units":"bytes","used":262144000}}},
"id":"run","type":"environment"}}
ldimaggi commented 6 years ago

I missed the meeting due to a conflict on this point: '...will add Delete API which actually deletes all components of a space in OSIO...'

Will this delete API remove the corresponding resources in OSO (build configs, deployment configs, etc)? Thx!

catrobson commented 6 years ago

@ldimaggi

Will this delete API remove the corresponding resources in OSO (build configs, deployment configs, etc)?

Yes, this is the intent.

ldimaggi commented 6 years ago

Perfect! We will want to make use of that in automated tests.

ldimaggi commented 6 years ago

I wanted to add a couple of additional data points to this issue as the patterns in which quota-related errors are seen are consistent:

builderror

The error is also reported in the Jenkins build log and the Jenkins pod :

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at:
 https://kubernetes.default/api/v1/namespaces/ldimaggi-stage/services. Message:
 Forbidden!Configured service account doesn't have access. Service account may have been revoked.
 services "march20f" is forbidden: exceeded quota: object-counts, requested: services=1, used:
 services=5, limited: services=5.
mceledonia commented 6 years ago

@ldimaggi @maxandersen @joshuawilson @aslakknutsen @dgutride @jiekang

I'm looking for some thoughts/guidance on my requirements doc for the resource quota limits story (link), I have the beginning of a design plan mapped out and want to make sure I'm not missing anything. Please check it out when you get a chance and let me know how it looks!

https://docs.google.com/document/d/1zbLVYu5D2_v8jr46_NHW47ILM7e1E9GTulLYlcW2f5c/edit?usp=sharing

joshuawilson commented 6 years ago

What is the status of this?

catrobson commented 6 years ago

@joshuawilson I think we need feedback from engineering on the link Michael shared above to make sure our design covers the situations needed. If so, this could move into UI development.

andrewazores commented 6 years ago

My comment above regarding the DeploymentsService functions and its design intent to support only the Deployments area is now out of date. We now export a DeploymentApiService at the application root module level, which looks like this: https://github.com/fabric8-ui/fabric8-ui/blob/master/src/app/space/create/deployments/services/deployment-api.service.ts . This service is intended to be consumed by non-Deployments components and makes no assumptions about how the consumer wants the data formatted or how the consumer wants the data pushed.

We have a separate user story (here https://github.com/openshiftio/openshift.io/issues/2380) which we were using to track the upcoming work related to this issue, specifically for the Deployments page context. This same problem of Websocket support for pushed notifications from the backend when certain events occur - such as when a deployment fails due to insufficient resources - would be required for some of the cases shown in Michael's design.

The resource usage slide out panel should be able to move forward only using the DeploymentApiService from the looks of it. FWIW the new "Settings > Resources" page has been implemented using the DeploymentApiService, and the Deployments area is also built overtop of it. Not all of the resources listed are available metrics from the backend however, which currently only supports CPU and Memory quotas, not services, persistent volume claims, replication controllers, etc. The app creation flow can probably also be implemented using only DeploymentApiService, since the current usage can be queried and the expected usage after a single pod deployment can be inferred from there.

andrewazores commented 6 years ago

We had a design review meeting with @mceledonia earlier today to go over his proposed design document. Not all of what is in that document can be implemented quite yet, in particular the ability to receive background event notifications from OSO about deployment failures, so we agreed to start small and incrementally iterate from there. The starting point will be a check for available quota on actions which increase usage (Launcher, Deployments scale up), and if there is no available quota in the environment, block the user action and display some message that the action cannot be performed due to lack of resources. This will prevent the user from entering a "failure due to exceeded quota" state to begin with.

rohanKanojia commented 6 years ago

@ldimaggi @joshuawilson : Is there anything required from build-team in this task? If not, could you please remove build-team label from this.

joshuawilson commented 6 years ago

I'm hopeful that between the Launch and Platform teams we can fix this. cc @animuk

andrewazores commented 6 years ago

@joshua @mceledonia has the designs for the new starting points we discussed for this issue:

https://redhat.invisionapp.com/share/CTGJ6F7V4K9

Can we split this out into a story with two tasks (Launcher and Deployments), and mark that new story as P0 and target it for before Summit? Then the remainder of the work in the original design, with the slide-out resource usage panel and notifications system, can become future work items for after Summit with reduced severity. Those future work things will not be implementable in the next week because they require a fair amount of UI work as well as significant backend support that does not yet exist.

joshuawilson commented 6 years ago

@andrewazores yes, do it. Please.

andrewazores commented 6 years ago

I have created a new user story at #3344 to track the work targeted for before Summit and marked that as P0, removing the P0 on this issue.

alexeykazakov commented 5 years ago

Not a bug. Rather missing functionality.