Open ljelinkova opened 6 years ago
Is there any possibility that this is a timing issue? I have seen random instances of the deployment to run requiring a long time to complete.
Or, might this be a resources issue? Do you see anything in the logs related to a quota being reached? Maybe the env reset is not removing any existing deployments to run?
@ljelinkova did you see resource quota? sometimes quota gets full for target namespace. OS accepts the new DC request successfully but not able to deploy it if the quota is reached.
@hrishin The resource quota should not be the problem since we reset the whole environment after each test.
@ldimaggi It might be the timing issue, maybe if the tests waited for some time the app would start.
But the main question is - how does random user know that the deployment failed? Or that the deployment is finished? Should not this be part of "Pipeline"? I simple assumed that once the Pipeline is finished I can assume all is set and ready and I can start using the application.
I agree that it should be part of the pipeline, i.e. "Rollout to Run" step should be marked as successful only when the app was deployed successfully. WDYT @openshiftio/uxd-team @catrobson
@ppitonak Agree we would only mark that step successful when the app was deployed successfully.
I consider this to be a P1 issue because if we ignore this failure we cannot push to prod.
We implemented a workaround in e2e tests (https://github.com/fabric8io/fabric8-test/pull/949).
I had a chat with @aslakknutsen @bartoszmajsak @ljelinkova @jiekang @joshuawilson ... the result of discussion is that we are not able to guarantee that the application is deployed and working at any point in time after the pipeline finished. The dev team is against adding the readiness probe to the pipeline.
@fabric8-ui/uxd I still think that we should signal to the users
While I see the point of separating pipelines and service/pod readiness I also believe there is an UX issue. We got many reports when user got confused when saw finished piplenes with unavailable app. For many users it looks like a bug.
The dev team is against adding the readiness probe to the pipeline.
@ppitonak I might have missed that part of the long discussion - can you shed some more light on why dev team is against that?
There are other scenarios where the pipeline is confusing. As @rhopp suggested, imagine this scenario.
And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?
@bartoszmajsak you are right that nobody explicitly said that they would be against, but nobody supported my suggestion. @aslakknutsen argued that we cannot guarantee that the application works at any point in time... while I agree with that I think that adding readiness probe to the pipeline itself will reduce user's confusion.
We would still need to solve problems described by Lucia and Aslak.
To restate my perspective from the discussion Aug 13 2018 on Mattermost
"I do agree that it would be worthwhile making sure we can display probe information if available. I think if the application has a probe, the OSO console sees that and is more clear. Our OSIO pages don't do anything with probes as far as I'm aware."
And now the question: What version is on the stage? Is it possible that you're still looking into version 1.0.1? Or is there already 1.0.2?
@ljelinkova you can see that in the openshift deployment object. There is a version label which can tell you that. Is that user-facing information? No. Can you test it to see if your assumption is valid? Yes.
Of course, the application itself could also expose this information, but that's up to the application to do or not.
One of my colleagues from different team tried OSIO and was also confused by the fact that the application was not available after the pipeline finished.
This seems also like a usability issue, so I am adding UX team label too.
@serenamarie125 Could somebody from UX team have a look at this?
The issue here is that some users expect the application to be deployed and ready when the pipeline is finished and that is not true. The end of the pipeline means that the deployment was triggered but the application might not be available for several minutes. However, the link to the deployed application is clickable and user gets the Application is not available page. While this behavior is technically correct, it might be quite confusing.
The new OSIO-pipeline library has verify deployment
check which fails the job if deployment is not up and running.
@sthaha @rupalibehera
From the pipeline side of things, I think it is better to spend effort on supporting maven builds using the new osio-pipeline than fixing it in the current f-p-l
I am treating this as a "won't fix" since the new pipeline solves it already
But the UI/UX side of it can still improve i.e. wait until application comes up ( get a 2XX status when you reach / GET
the application url )
As @sthaha and @hrishin mentioned, this issue has been fixed in the new pipeline where Jenkins job fails if the deployment fails. This train we are integrating java boosters in new pipeline and with that, this will get resolved. I think nothing apart from this needs to be done from the build-team side.
Cool, please let us know when it gets to prod-preview.
Do we need a new "stage" for lack of a better term on the pipelines UI for "deployed"? Or maybe we can just use the feedback to activate the link to the deployed app.
The first is a UX change. The later is just a code update to the new api.
Please let us know when it is available.
@muruGanesan will take an action item to hold a BlueJeans conversation with the stakeholders involved in this conversation ( and record for those of us who cannot attend )
@muruGanesan following up our today's conversation, the pipeline could show "in progress" and no link to app until the application is ready.
So - the UI would show "in progress" until the app endpoint was available, at which point, the checkbox/arrow icon would be displayed, yes?
What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?
@ppitonak, thanks for the screenshot.
What about the scenario that Pavel mentioned where a new version of an app is deployed? When do we disable the link to the previously deployed version of the app? When the user starts the build for the new version of the app?
When Build 2 starts, Build 1 is hidden so the issue doesn't exist until the link is displayed. In other words, if first run of pipeline is implemented correctly, there is no issue with second run of pipeline.
One of the problems is that if it goes green and the link is still inactive and they have the page open, they will just go and refresh the page. If the pipeline is not green till it is ready then we are giving the user a clue that they should not try.
@ldimaggi , @ppitonak, @joshuawilson,
Please find the 1st draft version & provide ur feedback https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness
lgtm
@ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg
Please look at the UX recommendation:
https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness
Note: @alexeykazakov provided his feedback and I responded back with details in the above 'Invision' file. Feel free to add your review comments. If everyone is fine with the UX recommendation please provide 'thumbs-up'.
< Iteration -3> @ldimaggi , @ppitonak, @alexeykazakov, @hrishin, @bartoszmajsak,@kwk, @ljelinkova, @sthaha, @piyush-garg I discussed with @sthaha on the following use cases: 1) case-1: Only one application (1 URL) 2) case-2: There is no application (0 URL) e.g. bot deployment 3) case-3: Multi-clusters ( > 1 URLs) - this is a future requirement
I covered the "case-1 and 2" and updated the flow. Please review the same and provide your feedback if any. https://redhat.invisionapp.com/share/DZOLKNI95YP#/325796400_Pipeline_Appln_readiness
NOTE: @alexeykazakov, please verify 'use case-2' and provide your inputs.
@muruGanesan I like the proposal.
@ljelinkova, Thanks. If the stakeholders are fine with UX recommendation, I request @sthaha, @joshuawilson to assign to an appropriate team.
When @sthaha confirms the backend supports it and there is a decision on the design, the UI team can pick up the changes to the pipeline page.
@ljelinkova, @joshuawilson, @sthaha, Since UXD proposal is accepted, I am removing 'UX label' and my name from the assignee list. Please add me/include me if anyone needs any clarification from the UX side.
CC: @serenamarie125
@muruGanesan should we also remove area/ux label?
@serenamarie125, No, don't have to remove 'area/UX' label because the issue touches some portion of UX. In addition, UX team is responsible when the label is 'team/ux' - which I removed already.
@sthaha when will the new OSIO-pipelines be ready (with the verify deployment check)?
@joshuawilson This has already been done in new pipeline https://github.com/fabric8io/osio-pipeline/blob/ef2345a2edf5ff978b3cee64c94ab8887e81e087/vars/deploy.groovy#L43
@piyush-garg is it in prod already?
@ppitonak New pipeline library is not in prod. We are working on moving java booster to the new pipeline. Apart from that there 2-3 other things that need to resolved to get that in production like new pipeline support for analytics and updating upstream boosters application.yaml
Does it make sense to deploy it to prod as an experimental feature and improve it step-by-step instead of doing a big-bang release?
There are still missing parts which needs to be done before deploying to prod like
When the updates are available in prod, please assign to UI team.
The E2E tests fail intermittently because the application is not available on run environment.
http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/07-02-run.html
The E2E workflow is:
This is the Jenkins log http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/05-01-jenkins-log.html
This is output of a script that lists Jenkins pods, we can add any oc command you might need to debug this issue
http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/322/oc-logs-output.txt