Add lock argument along with the actual building process

zelinh commented 2 years ago

Is your feature request related to a problem? Please describe

Problem statement

For our build dashboards, we are displaying the running status of each distribution build along with the head commit for each plugins included in the workflow. The way we getting the commit ID right now is by checking out into each plugin's repo on certain branch and getting the head commit within the build workflow process.

There might be a discrepancy for the commit ID if people pushing any commit when the build workflow is actively running. For example, our dashboard would display the current head commit has been updated with the latest push but the building process didn't actually pick that change.

We were thinking to use the commit IDs from the build manifest. When the distribution build workflow succeeds, it will create two different manifest: build manifest & bundle/distribution manifest, both of them will contain the commit ID for each components which shows the HEAD commits this current build job pick for every components in the manifest.

However, when the job fails, these manifests won't be created so people wouldn't know the head commit of each components current job is picking on and which caused the job failing. These commit IDs could be useful for maintainer to keep track of the build status and help them debug when error occurs, but currently we don't have a mechanism to display the commit IDs workflow is picking when distribution build Jenkins workflow ends up failure.

Existing feature

Currently we have lock option in our building script by running ./build.sh --lock $MANIFEST https://github.com/opensearch-project/opensearch-build/tree/main/src/build_workflow#avoiding-rebuilds

The function of this option right now is to generate a $MANIFEST.lock file with stable git references (commit IDs) and build options including its architecture, platform and etc.. Basically it replaces the ref in the input manifest from branch to commit id and return; https://github.com/opensearch-project/opensearch-build/blob/7b61ab48515cf8a1e55c63bbe2971685708e3fbf/src/run_build.py#L28-L39 However the lock wouldn't trigger the actual building process and we won't know about the results status (success or failure) of the building process with these certain commits.

Describe the solution you'd like

We could elaborate more on the lock option. We could add this lock into our actual building process instead of just generating the lock file.

In this way, when the lock argument is enabled, the build process will always go on and, at the same time the lock file will also be generated showing the correct stable git references current building process is on.

Describe alternatives you've considered

No response

Additional context

No response

dblock commented 2 years ago

There's a bit of misunderstanding above re:the lock option. It was designed to solve a different problem.

We have a parallel build process that builds x64 and amd64 in parallel. Let's say the input manifest ref for common-utils is main. The x64 build may check out common-utils main at rev1, then someone makes a commit to common-utils with rev2, and the amd64 build that runs a little bit behind will check out main at rev2. The same build number of OpenSearch now has different code between x64 and amd64. In practice it hasn't been a big problem as we freeze code towards the end, but it's a problem in daily builds.

The --lock mechanism was introduced to avoid rebuilding the same code without changes, and can be used to ensure that these parallel steps check out the same version(s) of each component. To fix the above problem:

Introduce a common Jenkins workflow step that generates the stable manifest with --lock. Stash the output.
Unstash the output in the x64 and amd64 build steps, use that as an input manifest.

This also makes https://github.com/opensearch-project/opensearch-build/pull/2473 a lot simpler, since one of the parts of that workflow checks out every component and tries to find its HEAD commit ID. That will be already known from the manifest lock above.

gaiksaya commented 2 months ago

@zelinh Is this issue still valid? Looks like we already have build and bundle manifest with exact commits for all builds (incremental and otherwise). With continue-on-error option, the build never fails except for mandatory components.

zelinh commented 2 months ago

I think this request should still be valid. When the build failed, we don't have a mechanism or platform to showcase the commit ID that build job is failing at when the input manifest using branch as reference.

opensearch-project / opensearch-build