packit / packit-service

Packit provided as a service
https://packit.dev
MIT License
37 stars 48 forks source link

Downstream KojiBuild state not handled correctly #2503

Closed xsuchy closed 1 month ago

xsuchy commented 2 months ago

What happened? What is the problem?

I discovered that sometimes BodhiUpdate is not filed by Packit, despite being correctly configured.

I did and upstream release of fedora-upgrade and submitted it to Fedora using tito release fedora-git. And I forgot to add --no-build. This uploaded src.rpm to all branches in dist-git and triggered koji build. The builds were submitted by both Packit and Tito. As you can see in https://koji.fedoraproject.org/koji/packageinfo?packageID=15136 Packit was first for F40 and Tito everywhere else. Who is second is denied to proceed. As you can see in F39 build https://koji.fedoraproject.org/koji/taskinfo?taskID=122080964 submitted by Packit. Although it failed Packit reports in dashboard that the build is still pending https://dashboard.packit.dev/results/koji-builds/6850

And what is important is that F39 bodhi update was not filled. Despite the condition was meant - there is successful build https://koji.fedoraproject.org/koji/buildinfo?buildID=2531586

What did you expect to happen?

I expected that Bodhi update for F39 is created.

Example URL(s)

No response

Steps to reproduce

1. fedpkg import foo.src.rpm
2. fedpkg build (be quicker than Packit)
3. Bodhi update is not created.

What is the impacted category (job)?

Fedora release automation

Workaround

Participation

nforro commented 2 months ago

And what is important is that F39 bodhi update was not filled. Despite the condition was meant - there is successful build https://koji.fedoraproject.org/koji/buildinfo?buildID=2531586

There are however no allowed builders configured: https://src.fedoraproject.org/rpms/fedora-upgrade/blob/2e7791f16bbea3aedf43c81dfb8db3f417a43f02/f/packit.yaml#_18 So the only account whose build can trigger the update is packit.

lbarcziova commented 2 months ago

During stand-up meeting, we agreed that this is an expected behaviour. However, the build failure should be correctly handled/ reflected in DB (and shown on dashboard). Besides that, we may think about a way how to notify users about the failures (e.g. via mail), a bit related to #2404 .

lbarcziova commented 1 month ago

From an initial investigation, it looks like this happens for all builds duplicated by us, i.e. where the build triggered by us fails because the build is already in progress/completed (triggered by someone else). Koji doesn't seem to emit messages on the message bus for those (which we rely on for updating the status).

nforro commented 1 month ago

This is actually quite similar to https://github.com/packit/packit/issues/2427, if there is a way with Koji API to determine whether a non-scratch build for a NVR is in progress, we could skip/fail before submitting the build.

lbarcziova commented 1 month ago

Yes, sounds good, my only concern especially for this issue would be the race conditions.

nforro commented 1 month ago

if there is a way with Koji API to determine whether a non-scratch build for a NVR is in progress

We have KojiHelper.get_build_info() that accepts a NVR.

my only concern especially for this issue would be the race conditions

Hm, true, I'm not sure if there even is a way to avoid them.

lbarcziova commented 1 month ago

For this particular issue, I was thinking we could have a "babysit" task to just handle the forever-pending builds (which might be overkill), but for packit/packit#2427 there is probably no way.

I would start here by adding the check of KojiHelper.get_build_info() and see how often we will still hit a race condition.

nforro commented 1 month ago

Do we want to fix both issues at the same time? The only difference should be checking if KojiBuildState(KojiHelper().get_build_info(nvr)["state"]) returns KojiBuildState.building or KojiBuildState.complete.

lbarcziova commented 1 month ago

We agreed on implementing the solution from packit/packit#2427 here which should solve both issues.