Open jithine opened 6 months ago
FYI @kumada626 @y-oksaku
This behavior occurs irrespective of the remote trigger.
When restart the first
build, the z
build does not wait for the y-third
build.
The latest y-third
build at the moment x-second
triggers z
is that of the parent event.
jobs:
first:
requires: [ ~commit ]
steps:
- echo: echo 'no wait'
x-second:
requires: [ first ]
steps:
- echo: echo 'no wait'
y-second:
requires: [ first ]
steps:
- wait: sleep 30
y-third:
requires: [ y-second ]
steps:
- echo: echo 'no wait'
z:
requires: [ x-second, y-third ]
steps:
- echo: echo 'no wait'
In your case, at the moment the second
build was triggered, the downstream builds did not yet exist.
Upstream builds:
Downstream builds:
Therefore, when evaluating whether to trigger the second
build, the downstream build statuses from the parent event were used.
Based on this, it was determined that the conditions were satisfied.
Although, this behavior seems logical.
Triggering a restart only after all 'join' builds have completed would require calculating which builds are scheduled to execute in the future. This logic could become excessively complex, particularly in situations where there are multiple restart events or when users stop or disable some builds during events are running.
In my opinion, the logic that employs the latest build statuses at a particular moment, much like a snapshot, is simple and elegant.
@y-oksaku I believe current logic is buggy.
In most cases we've seen is that if user is restarting an upstream job, their intention is all downstream jobs to run new. But what's happening here is that among the downstream jobs a join
job is behaving differently, it's running while taking status from previous events and not waiting for upstream builds in current event
The whole aspect of a restarted build looking back at the status of builds of it's upstream jobs, from previous events, should be applicable only if that job doesn't get executed in the current event
I agree with your analysis that this logic would be a bit complex, however without it we are breaking user expectation.
This behavior occurs irrespective of the remote trigger. When restart the
first
build, thez
build does not wait for they-third
build. The latesty-third
build at the momentx-second
triggersz
is that of the parent event.
jobs: first: requires: [ ~commit ] steps: - echo: echo 'no wait' x-second: requires: [ first ] steps: - echo: echo 'no wait' y-second: requires: [ first ] steps: - wait: sleep 30 y-third: requires: [ y-second ] steps: - echo: echo 'no wait' z: requires: [ x-second, y-third ] steps: - echo: echo 'no wait'
In your case, at the moment the
second
build was triggered, the downstream builds did not yet exist.Upstream builds:
* [first](https://cd.screwdriver.cd/pipelines/12907/builds/917364/steps/sd-teardown-screwdriver-cache-bookend) completed at 23:31:36 * [second](https://cd.screwdriver.cd/pipelines/12907/builds/917365/steps/sd-setup-init) created at 23:31:37
Downstream builds:
* [l2-1](https://cd.screwdriver.cd/pipelines/12908/builds/917366/steps/sd-setup-init) created at 23:31:42 * [l2-2](https://cd.screwdriver.cd/pipelines/12909/builds/917367/steps/sd-setup-init) created at 23:31:46 * [l2-3](https://cd.screwdriver.cd/pipelines/12910/builds/917368/steps/sd-setup-init) created at 23:31:50
Therefore, when evaluating whether to trigger the
second
build, the downstream build statuses from the parent event were used. Based on this, it was determined that the conditions were satisfied.
@jithine Thanks for your opinion.
Indeed, as you've noted, I believe that the behavior you've described would align with user expectations.
But given that the logic is expected to become complex, I think it would be good to make corrections after conducting a detailed examination of the behavior and design, along with other trigger issues.
@y-oksaku Here are our thoughts on restart scenarios
When a build has been restarted from workflowgraph, it creates a new event with same groupEventId as it's parent. Now if there is a join build in the sub-workflowgraph for this execution, then computing it's execution readiness should depende on the following conditions.
requires
) in the current event's workflowgraph, then this join build shouldn't be started until that path completes and produces a successful build for the upstream job. groupEventId
then let's continue that behavior.groupEventId
@jithine Thank you for your detailed examination. I also think this approach seems good. However, there might be some corner cases, so we will proceed with corrections cautiously.
What happened:
When an upstream job of the remote-join job is restarted, remote-join job immediately runs reusing build status from previous event.
https://cd.screwdriver.cd/pipelines/12907/events/784132
Job first was restarted from previous event, where remote-join in job
second
worked as expected, however in the new event, jobsecond
ran immediately without waiting for any of the downstream to be triggered or executed.What you expected to happen:
Job
second
in the restarted event should depend for status from newly triggered remote downstream jobs.How to reproduce it: