Open ruanyl opened 1 month ago
This is the same issue I described here:
The are multiple things happening here:
incremental
then a previous success copy of artifacts will be pulled from S3.continue-on-error
was co-enabled with incremental
, so that if a plugin build failed, it will move on to the next plugin without failing the whole pipeline.good
copy is on disk due to incremental
, caused the build recording to record it into the build manifest, and the build recorder is in action because pipeline is not failling due to continue-on-error
.incremental
, and treated as success in the build recorder and assemble workflow.Involve @zelinh again to see if there is any better way to solve this. Probably remove the zips that is not in input manifest and the zips that is meant to be rebuild, to avoid cache polluting the new builds.
Thanks.
I believe it was by design to include the previously built component (using incremental) if the new commit build for that plugin is failing. We could still have a complete bundle using previous commit which is very much nightly built artifact trait. Logging the failure needs to be better to get an idea what is being installed. If SA failed to built and previous copy is being installed that is expected and should be okay but needs to be informed to the user. Incremental and continue-on-error can go hand in hand. We do not want to fail entire workflow for a single component but also install if previous copy exists.
Also adding @dblock to get some suggestion on what should be the better approach.
@peterzhuamazon Thanks!
Probably remove the zips that is not in input manifest and the zips that is meant to be rebuild, to avoid cache polluting the new builds.
https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0/10377/linux/x64
guess this is where the zips are stored? If this is true, how does it resolve to an earlier zip when the current build is failed?Some more discussion related to the same topic is part of this issue https://github.com/opensearch-project/opensearch-build-libraries/issues/455.
Hi @gaiksaya, thanks!
I believe it was by design to include the previously built component (using incremental) if the new commit build for that plugin is failing.
I kinda get the point of doing this as We could still have a complete bundle
. But just feel a broken docker image is worse than missing certain features. Perhaps for most cases using a previously build component won't result in runtime error, this is why it's by design? Btw, shall we also publish a docker image tag with build number? So that people can easily revert when encountering issue.
[Triage]
Previous discussion for this behavior https://github.com/opensearch-project/opensearch-build-libraries/issues/455#issuecomment-2286891453
Nightly artifacts are expected to be unstable/broken. That's how we catch issues and raise them with component teams. We are working on adding smoke tests at the distribution level that would detect if the given artifact is valid or not. Long term plan can be to put those artifacts under something /valid
per version.
Adding @zelinh who is working on smoke testing framework.
Nightly artifacts are expected to be unstable/broken.
Thanks @gaiksaya, that's faire point. When pushing the dock image tag, does it make sense to push a tag with the build number? That helps to revert to a previous valid version. Or any suggestion on how to revert now?
What is the use-case here? Where are the docker images being used?
@gaiksaya I'm using docker image from https://hub.docker.com/r/opensearchstaging/opensearch, we use 3.0.0(main) or the current 2.18.0(2.x) to setup clusters for development/testing/demo env for OSD features on main/2.x branch.
I would recommend to use validation workflow present in this repo to make sure the artifacts that you are deploying are valid. We are using similar one in nightly playgrounds workflow. However, recently I encountered a bug related to OSD https://github.com/opensearch-project/opensearch-build/issues/5117
Related, https://github.com/opensearch-project/opensearch-build/issues/5130.
I think as a consumer of any docker staging build I'd like to know:
Describe the bug
Checking this pipeline on build.sh step: https://build.ci.opensearch.org/blue/organizations/jenkins/distribution-build-opensearch/detail/distribution-build-opensearch/10377/pipeline/151 The build of
security-analytics
was failed:However, the plugin was installed in step
assemble.sh
in https://build.ci.opensearch.org/blue/organizations/jenkins/distribution-build-opensearch/detail/distribution-build-opensearch/10377/pipeline/963Shouldn't the plugin be excluded if it failed to build?
I'm having runtime issue now running 2.18.0 and 3.0.0 docker image which looks related:
To reproduce
Run 2.18.0 and 3.0.0 opensearch docker image
Expected behavior
No response
Screenshots
If applicable, add screenshots to help explain your problem.
Host / Environment
No response
Additional context
No response
Relevant log output
No response