googleapis / java-shared-config

Shared Maven build configuration for Google Cloud Java client libraries.
Apache License 2.0
19 stars 15 forks source link

The release job waits indefinitely for a new image to be available #833

Closed diegomarquezp closed 1 month ago

diegomarquezp commented 1 month ago

Context: I was trying to release the repo by merging https://github.com/googleapis/java-shared-config/pull/810 but no release showed up after 2 hours. I tried the job again with no luck - kept waiting for the image again for +1 hour.

This is the loop where the jobs stay waiting for the image availability.

Other release jobs normally take 15-25 minutes. In this release I tried the job twice, hanging for +1 hour waiting for the image.

suztomo commented 1 month ago

Would you collect the observation about the image publication?

mpeddada1 commented 1 month ago

Pasting the error from the logs. The image appears to be failing when it is being built:

Step #0 - "graalvm-a-build": https://yum.oracle.com/repo/OracleLinux/OL7/developer/x86_64/repodata/5abe05b1c70fbbffbe343c73250b383ef57c4845-filelists.sqlite.bz2: [Errno 14] curl#18 - "transfer closed with 127733296 bytes remaining to read"
Step #0 - "graalvm-a-build": Trying other mirror.
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":  One of the configured repositories failed (Oracle Linux 7Server Development Packages (x86_64)),
Step #0 - "graalvm-a-build":  and yum doesn't have enough cached data to continue. At this point the only
Step #0 - "graalvm-a-build":  safe thing yum can do is fail. There are a few ways to work "fix" this:
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":      1. Contact the upstream for the repository and get them to fix the problem.
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":      2. Reconfigure the baseurl/etc. for the repository, to point to a working
Step #0 - "graalvm-a-build":         upstream. This is most often useful if you are using a newer
Step #0 - "graalvm-a-build":         distribution release than is supported by the repository (and the
Step #0 - "graalvm-a-build":         packages for the previous distribution release still work).
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":      3. Run the command with the repository temporarily disabled
Step #0 - "graalvm-a-build":             yum --disablerepo=ol7_developer ...
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":      4. Disable the repository permanently, so yum won't use it by default. Yum
Step #0 - "graalvm-a-build":         will then just ignore the repository until you permanently enable it
Step #0 - "graalvm-a-build":         again or use --enablerepo for temporary usage:
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":             yum-config-manager --disable ol7_developer
Step #0 - "graalvm-a-build":         or
Step #0 - "graalvm-a-build":             subscription-manager repos --disable=ol7_developer
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":      5. Configure the failing repository to be skipped, if it is unavailable.
Step #0 - "graalvm-a-build":         Note that yum will try to contact the repo. when it runs most commands,
Step #0 - "graalvm-a-build":         so will have to try and fail each time (and thus. yum will be be much
Step #0 - "graalvm-a-build":         slower). If it is a very temporary problem though, this is often a nice
Step #0 - "graalvm-a-build":         compromise:
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build":             yum-config-manager --save --setopt=ol7_developer.skip_if_unavailable=true
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build": failure: repodata/5abe05b1c70fbbffbe343c73250b383ef57c4845-filelists.sqlite.bz2 from ol7_developer: [Errno 256] No more mirrors to try.
mpeddada1 commented 1 month ago

Running gcloud builds submit --config=.cloudbuild/cloudbuild-test-a.yaml . on latest main results in successful image build:

Step #0 - "graalvm-a-build": Dependency Installed:
Step #0 - "graalvm-a-build":   libjq1.x86_64 0:1.5-1.0.1.el7          oniguruma.x86_64 0:5.9.5-3.el7         
Step #0 - "graalvm-a-build": 
Step #0 - "graalvm-a-build": Complete!
Step #0 - "graalvm-a-build": Removing intermediate container 35acd00515d9
Step #0 - "graalvm-a-build":  ---> dcd0559ece04
Step #0 - "graalvm-a-build": Step 11/11 : WORKDIR /workspace
Step #0 - "graalvm-a-build":  ---> Running in b062c517dd3a
Step #0 - "graalvm-a-build": Removing intermediate container b062c517dd3a
Step #0 - "graalvm-a-build":  ---> 7bc37763ba02
Step #0 - "graalvm-a-build": Successfully built 7bc37763ba02
Step #0 - "graalvm-a-build": Successfully tagged gcr.io/cloud-devrel-public-resources/graalvm_a:1.8.0
Finished Step #0 - "graalvm-a-build"
suztomo commented 1 month ago

@diegomarquezp Can you try running the release job again?

diegomarquezp commented 1 month ago

Triggered the build manually from fusion in https://github.com/googleapis/java-shared-config/pull/810#issuecomment-2140680170 It is still waiting for the image atm

Discussion thread

diegomarquezp commented 1 month ago

More context from discussion with @suztomo @mpeddada1 and @JoeWang1127

There is a cloudbuild job the stage.sh kokoro job depends on. This is the trigger. This trigger expects a new tag to be published and then builds the shared config image and then publishes it to cloud-devrel-public-resources.

The stage.sh job has a loop that waits for the image to be available. It does not build the image.

@mpeddada1 will enhance the logs of the kokoro job to link the cloud build trigger for easier investigations in the future.

For the current release, we re-created the tag to trigger the image-build job. Then we will manually trigger stage.sh once the image is ready.

Thanks all!