Intermittent issue - application is not available on run environment

ljelinkova commented 6 years ago

The E2E tests fail intermittently because the application is not available on run environment.

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/07-02-run.html

The E2E workflow is:

Create space
Create Vert.x with REST API and Rollout to Run strategy
Waits until the pipeline is finished and promotes to Run
Checks the stage application
Checks the run application - this step fails

This is the Jenkins log http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/05-01-jenkins-log.html

This is output of a script that lists Jenkins pods, we can add any oc command you might need to debug this issue

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/oc-logs-output.txt

ldimaggi commented 6 years ago

Is there any possibility that this is a timing issue? I have seen random instances of the deployment to run requiring a long time to complete.

Or, might this be a resources issue? Do you see anything in the logs related to a quota being reached? Maybe the env reset is not removing any existing deployments to run?

hrishin commented 6 years ago

@ljelinkova did you see resource quota? sometimes quota gets full for target namespace. OS accepts the new DC request successfully but not able to deploy it if the quota is reached.

ljelinkova commented 6 years ago

@hrishin The resource quota should not be the problem since we reset the whole environment after each test.

@ldimaggi It might be the timing issue, maybe if the tests waited for some time the app would start.

But the main question is - how does random user know that the deployment failed? Or that the deployment is finished? Should not this be part of "Pipeline"? I simple assumed that once the Pipeline is finished I can assume all is set and ready and I can start using the application.

ppitonak commented 6 years ago

I agree that it should be part of the pipeline, i.e. "Rollout to Run" step should be marked as successful only when the app was deployed successfully. WDYT @openshiftio/uxd-team @catrobson

catrobson commented 6 years ago

@ppitonak Agree we would only mark that step successful when the app was deployed successfully.

kwk commented 6 years ago

I consider this to be a P1 issue because if we ignore this failure we cannot push to prod.

ppitonak commented 6 years ago

We implemented a workaround in e2e tests (https://github.com/fabric8io/fabric8-test/pull/949).

I had a chat with @aslakknutsen @bartoszmajsak @ljelinkova @jiekang @joshuawilson ... the result of discussion is that we are not able to guarantee that the application is deployed and working at any point in time after the pipeline finished. The dev team is against adding the readiness probe to the pipeline.

@fabric8-ui/uxd I still think that we should signal to the users

that their application was working at some point after pipeline finished (e.g. another step in pipeline) or
that their application is not working at the moment (e.g. status icon next to the link to run env or link to deployments page)

alexeykazakov commented 6 years ago

While I see the point of separating pipelines and service/pod readiness I also believe there is an UX issue. We got many reports when user got confused when saw finished piplenes with unavailable app. For many users it looks like a bug.

bartoszmajsak commented 6 years ago

The dev team is against adding the readiness probe to the pipeline.

@ppitonak I might have missed that part of the long discussion - can you shed some more light on why dev team is against that?

ljelinkova commented 6 years ago

There are other scenarios where the pipeline is confusing. As @rhopp suggested, imagine this scenario.

You have successfully deployed application in version 1.0.1 to both stage and run
You trigger new build, that finishes the step Rollout to stage for version 1.0.2
You are asked for promotion to Run environment
You click the link to stage on Pipelines screen to see the application on stage

And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?

ppitonak commented 6 years ago

@bartoszmajsak you are right that nobody explicitly said that they would be against, but nobody supported my suggestion. @aslakknutsen argued that we cannot guarantee that the application works at any point in time... while I agree with that I think that adding readiness probe to the pipeline itself will reduce user's confusion.

We would still need to solve problems described by Lucia and Aslak.

jiekang commented 6 years ago

To restate my perspective from the discussion Aug 13 2018 on Mattermost

"I do agree that it would be worthwhile making sure we can display probe information if available. I think if the application has a probe, the OSO console sees that and is more clear. Our OSIO pages don't do anything with probes as far as I'm aware."

bartoszmajsak commented 6 years ago

And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?

@ljelinkova you can see that in the openshift deployment object. There is a version label which can tell you that. Is that user-facing information? No. Can you test it to see if your assumption is valid? Yes.

Of course, the application itself could also expose this information, but that's up to the application to do or not.

ljelinkova commented 6 years ago

One of my colleagues from different team tried OSIO and was also confused by the fact that the application was not available after the pipeline finished.

This seems also like a usability issue, so I am adding UX team label too.

ljelinkova commented 6 years ago

@serenamarie125 Could somebody from UX team have a look at this?

The issue here is that some users expect the application to be deployed and ready when the pipeline is finished and that is not true. The end of the pipeline means that the deployment was triggered but the application might not be available for several minutes. However, the link to the deployed application is clickable and user gets the Application is not available page. While this behavior is technically correct, it might be quite confusing.

hrishin commented 6 years ago

The new OSIO-pipeline library has verify deployment check which fails the job if deployment is not up and running.

@sthaha @rupalibehera

sthaha commented 6 years ago

From the pipeline side of things, I think it is better to spend effort on supporting maven builds using the new osio-pipeline than fixing it in the current f-p-l

I am treating this as a "won't fix" since the new pipeline solves it already

But the UI/UX side of it can still improve i.e. wait until application comes up ( get a 2XX status when you reach / GET the application url )

piyush-garg commented 6 years ago

As @sthaha and @hrishin mentioned, this issue has been fixed in the new pipeline where Jenkins job fails if the deployment fails. This train we are integrating java boosters in new pipeline and with that, this will get resolved. I think nothing apart from this needs to be done from the build-team side.

ppitonak commented 6 years ago

Cool, please let us know when it gets to prod-preview.

joshuawilson commented 6 years ago

Do we need a new "stage" for lack of a better term on the pipelines UI for "deployed"? Or maybe we can just use the feedback to activate the link to the deployed app.

The first is a UX change. The later is just a code update to the new api.

Please let us know when it is available.

serenamarie125 commented 6 years ago

@muruGanesan will take an action item to hold a BlueJeans conversation with the stakeholders involved in this conversation ( and record for those of us who cannot attend )

ppitonak commented 6 years ago

@muruGanesan following up our today's conversation, the pipeline could show "in progress" and no link to app until the application is ready.

osio_pipeline_2

ldimaggi commented 6 years ago

So - the UI would show "in progress" until the app endpoint was available, at which point, the checkbox/arrow icon would be displayed, yes?

What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?

muruGanesan commented 6 years ago

@ppitonak, thanks for the screenshot.

ppitonak commented 6 years ago

What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?

When Build 2 starts, Build 1 is hidden so the issue doesn't exist until the link is displayed. In other words, if first run of pipeline is implemented correctly, there is no issue with second run of pipeline.

joshuawilson commented 6 years ago

One of the problems is that if it goes green and the link is still inactive and they have the page open, they will just go and refresh the page. If the pipeline is not green till it is ready then we are giving the user a clue that they should not try.

muruGanesan commented 6 years ago

@ldimaggi , @ppitonak, @joshuawilson,

Please find the 1st draft version & provide ur feedback https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

joshuawilson commented 6 years ago

lgtm

muruGanesan commented 6 years ago

@ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg
Please look at the UX recommendation: https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

Note: @alexeykazakov provided his feedback and I responded back with details in the above 'Invision' file. Feel free to add your review comments. If everyone is fine with the UX recommendation please provide 'thumbs-up'.

muruGanesan commented 6 years ago

< Iteration -3> @ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg I discussed with @sthaha on the following use cases: 1) case-1: Only one application (1 URL) 2) case-2: There is no application (0 URL) e.g. bot deployment 3) case-3: Multi-clusters ( > 1 URLs) - this is a future requirement

I covered the "case-1 and 2" and updated the flow. Please review the same and provide your feedback if any. https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness

NOTE: @alexeykazakov, please verify 'use case-2' and provide your inputs.

ljelinkova commented 6 years ago

@muruGanesan I like the proposal.

muruGanesan commented 6 years ago

@ljelinkova, Thanks. If the stakeholders are fine with UX recommendation, I request @sthaha, @joshuawilson to assign to an appropriate team.

joshuawilson commented 6 years ago

When @sthaha confirms the backend supports it and there is a decision on the design, the UI team can pick up the changes to the pipeline page.

muruGanesan commented 6 years ago

@ljelinkova, @joshuawilson, @sthaha, Since UXD proposal is accepted, I am removing 'UX label' and my name from the assignee list. Please add me/include me if anyone needs any clarification from the UX side.

CC: @serenamarie125

serenamarie125 commented 6 years ago

@muruGanesan should we also remove area/ux label?

muruGanesan commented 6 years ago

@serenamarie125, No, don't have to remove 'area/UX' label because the issue touches some portion of UX. In addition, UX team is responsible when the label is 'team/ux' - which I removed already.

joshuawilson commented 5 years ago

@sthaha when will the new OSIO-pipelines be ready (with the verify deployment check)?

piyush-garg commented 5 years ago

@joshuawilson This has already been done in new pipeline https://github.com/fabric8io/osio-pipeline/blob/ef2345a2edf5ff978b3cee64c94ab8887e81e087/vars/deploy.groovy#L43

ppitonak commented 5 years ago

@piyush-garg is it in prod already?

piyush-garg commented 5 years ago

@ppitonak New pipeline library is not in prod. We are working on moving java booster to the new pipeline. Apart from that there 2-3 other things that need to resolved to get that in production like new pipeline support for analytics and updating upstream boosters application.yaml

ppitonak commented 5 years ago

Does it make sense to deploy it to prod as an experimental feature and improve it step-by-step instead of doing a big-bang release?

piyush-garg commented 5 years ago

There are still missing parts which needs to be done before deploying to prod like

christianvogt commented 5 years ago

When the updates are available in prod, please assign to UI team.

openshiftio / openshift.io

Intermittent issue - application is not available on run environment #4009