Open timja opened 11 years ago
Example Use Case:
Lets examine the simple use case of a diamond-shaped dependency between 4 jobs, as mentioned in the description. You might first start by trying the Join plugin, configuring JobD as a downstream dependency of JobA, and Jobs B and C as the "joining" jobs. Once complete you may look to a Dependency Graph (ie: via the Dependency Graph plugin) to confirm the relation does in fact show the correct orientation - which it does. You then set each job to block when upstream and downstream dependencies are running, so JobA won't run while JobD is running and such.
But then you try to see how well the 4 jobs orchestrate with one another. Let's make JobC fail for example. If you run JobA and it in turn triggers Jobs B and C, C will fail and that prevents JobD from triggering. However, afterwards, if you force build JobD it happily tries to go about its business even though dependent job C is broken. Next, lets try forcing JobB. Given that both Jobs B and C are upstream dependencies of JobD you would intuitively expect JobD to trigger once JobB completes... however it does not.
So, for obvious reasons the join plugin falls short of robust dependency management. So lets try the Build Flow plugin. You create your 4 jobs as discussed but instead of using the Join plugin you use the DSL to script a "join" operation, something like:
build ("jobA") parallel { build("jobB") build("jobC") } build ("jobD")
Once again, you run the build flow and all looks fine and dandy, until you try to orchestrate each job in isolation. Running JobB directly has no affect on JobD. In fact, the Build Flow plugin doesn't seem to interface at all with the Jenkins job dependency system because you can manually force all 4 jobs to run in parallel if you trigger them directly. So the build flow plugin, again, doesn't provide the necessary results for this simple use case.
Needless to say I have yet to find a solution for this use case.
Example Use Case:
Suppose you have two jobs: A and B. Suppose you want job A to trigger job B but only under certain conditions. Maybe job B is some lengthy unit testing functionality that you only want to run at night, and job A is the compilation operation that builds the code and unit tests to be run - so job B clearly "depends" on job A.
You quickly realize again that you need a plugin for this. You may start with the Conditional Build Step plugin and have JobA run JobB as a build step. Since these two jobs are dependent on one another you may set the build operation in JobA to block while JobB is executing. This works fine when running JobA, however once again, since this plugin does not respect the dependencies between jobs, Jenkins will happily allow JobB to be manually triggered while JobA is broken, and it will be equally happy to run JobA even if JobB has been triggered in some other way (e.g.: from a SCM commit). There are other dependency management problems with this solution as well that I won't get into here - i think you get the point.
Needless to say this option is out. So next you try to find some kind of post-build trigger to do this - but unfortunately there is none. The only workaround to this limitation that I've found is to use yet another plugin, Flexible Publish, and combine that with the Conditional Build Step. However once you try this solution it is quickly apparent that it too suffers from a similar set of ailments. For example, if you have the Dependency Graph plugin installed, the generated graph doesn't even show JobB as a downstream of A, let alone having Jenkins correctly respect the dependency between the two jobs. Jenkins will still happily allow concurrent execution of both jobs regardless of the "blocking" for upstream/downstream settings.
Again, yet another trivial dependency management pattern that I have yet to find a solution for.
Example Use Case
Suppose you have three jobs: A->B->C, where C depends on B which in turn depends on A, and each job has the 'blocking' options set for upstream and downstream jobs. Suppose C is building, during which time someone commits a change to projects A and B at the same time for a single change. Now, with SCM polling enabled it is possible that job B may pick up the commit to its source project before job A. In this case you quickly realize that Jenkins has a FIFO scheduling policy, so Job B gets scheduled to run before Job A. This is true even if both jobs are triggered and queued while job C is running.
What results is that once job C is complete, job B will execute and fail because the associated change to job A has not yet been picked up. Then Job A runs, and build successfully, after which job B is triggered a second time, after which it is successful now that it has the required output from job A.
So, back to the plugins we go. There are several plugins that purport to circumvent this limitation including Dependency Queue and Priority Sorter. Luckily these plugins do work pretty well, however they tend to be fragile. For example, the priority sorter plugin relies on having an effective scaling pattern in place and used consistently between all jobs. This is easy to manage with 10 jobs, but not so much with 500. Conversely, the dependency queue plugin seems to rely on the triggering relations between jobs, so if you are forced to use a plugin to relate jobs that doesn't respect this trigger relationship, then it too will fail to properly schedule the jobs based on dependency.
My basic point here is that dependency management is hard, and perhaps the original Jenkins authors knew this and as a consequence have largely left it up to plugin makers to fill the missing void. But I strongly believe that dependency management is a core requirement of a good CI system, and I can not see how one could effectively outsource such features to third parties. To make matters worse you need to employ a dozen or more different plugins to achieve any semblance of a complex dependency management system, and thus each of those plugins must inter-operate with the others in a certain key ways for the dependencies to work correctly.
Even still, there are quite a few problematic issues that I have yet to find workarounds for, like the "blocking when a dependent build is broken" issue described here. In my opinion the core Jenkins developers need to strongly consider adding a robust dependency management framework to the Jenkins core, perhaps providing a plugable API that plugin developers can then use to enhance these feature, focusing on specific sub-sets of dependency manage and isolated use cases.
I completely agree. I started using Jenkins Friday. On Friday my goal was to get Jenkins working with 1 project. That one project has 2 dependencies on other projects, one of which is dependent on the other. So a simple linear relationship. It's shocking to me that on the first day of using Jenkins I already needed to extend the functionality of it just to have Jenkins be dependent on another project.
Navigating plugins can be a daunting task, especially when you're brand new to a tool. Which ones are good? Which ones are waay too complicated? Which one might break Jenkins? Who knows?
At the very least Jenkins needs something out of the box to support dependencies. Some base functionality, since I just can't imagine that organizations at some point aren't going to need a project that has dependencies on other projects. I wound up installing Copy Artifact, and that works fine. Why can't it, or some other plugin made for dependencies be included in Jenkins?
The OP is right though. There's far more complex dependency use cases that need to be addressed. Dependencies are simply a wide spread need that should come in the box.
Have you tried implementing these kinds of use cases using the Workflow system? It does nothing magical in terms of investigating dependencies; it just runs the things you told it to, in the order you told it, under the conditions you specified. Nonetheless it has some aspects which allow it to model the kinds of scenarios you discuss better than the Build Flow DSL plugin. In particular:
In common with Build Flow:
We had tried the Build Flow plugin, which does meet some of our needs, but one thing in particular it didn't seem to handle was the ability to run sub-sets of the dependency tree depending on which job has changes made to it. For example, if job B depends on job A and someone commits a change to job B we don't want job A to be built. This is necessary to improve build efficiency in our current infrastructure because building (and, by extension, testing) each module in our dependency tree is very time consuming, even for "no-op" builds.
When you mention the Jenkins "workflow system", I assume you are referring to this plugin project I found on GitHub (which I believe you may be a contributor or maintainer). I have to say I hadn't heard of this plugin until you mentioned it, but it does sound promising. If it supports this sort of partial-build operation I just mentioned I may be interested in trying it. Specifically, if two dependent jobs can be configured with separate / independent SCM URLs so they may be triggered independently, while still allowing the build orchestration based on the job dependencies let me know. Even better, if you can point me to an example of how this may be done using that plugin I will let you know whether it works for our needs.
There is no built-in notion of a “partial build”. Probably something like that could be created as an operator if there is sufficient demand. There would still be one job with a sequence of builds, but some of these would be skipping build steps based on an inspection of the SCM changelog and a determination that nothing has changed in this area since the previous build.
That is more-or-less how I envisioned the tool working actually.
Just to be clear, the rationale for that particular requirement is caused by the size of our codebase and the sheer time required to build and test all interdependent modules. For example, we have 20+ interdependent modules in our codebase, each of which requiring its own separate compile and test phase. In many cases the test phases take much longer to complete than the compilation phase so we tend to try avoiding execution of unit tests when they are not necessary. We do this currently by only running tests when a compile is performed, and compiles are only performed when there are commits to a module or one of its dependencies.
So, if Module B depends on Module A and someone commits a change to Module A we want to build and test Module A then Module B. However, if a change is made only to Module B we don't want to re-build or re-test Module A. We manage this at the moment using commit triggers in Jenkins. When changes are made to Module B we only trigger Module B's compile and test, which will then in turn trigger any downstream modules that depend on it.
To give more practical results for reference, consider you have 26 modules dependent on one another, Modules A-Z. Further, suppose each module has an associated test phase that requires 5 minutes to complete. If someone commits a change to Module Z and all modules in the pipeline get rebuilt and tested, even if the build/rebuild process for each is instant - which typically they aren't - it will still require 2h+ to run through the pipeline. However, if we can only rebuild and retest Module Z we can complete the process in 5 minutes (plus the time to recompile of course) which provides significantly faster feedback to developers.
Granted our particular needs are somewhat exacerbated by the sheer size of our codebase, I would be surprised if even smaller shops wouldn't see an obvious benefit to this. Even if you only have 2 or 3 interdependent modules that exhibit the same behavior you would see noticeable improvements in turnaround time.
I have this same use case. We have an application with thousands of modules that all depend on one another in various ways. It takes 10 hours to build the whole thing. This makes it hard to respond quickly if something goes wrong. A partial build operator would be great.
Where it gets complicated is if you want your pipeline to behave so that if someone commits to module K, then of course K gets rebuilt & retested, but also L–Z get rebuilt and retested. Or just some of those, in the transitive downstream dependencies of K. Presuming you do not want to manually code these dependencies, you need some kind of tool which can inspect your sources and figure them out. Traditionally this was make, but if you want to spread builds across multiple Jenkins slaves, then you would need to reimplement that kind of analysis in a Workflow script using some library built for the purpose (say, by scanning Maven POMs). It is a potentially large topic.
Presuming you do not want to manually code these dependencies, you need some kind of tool which can inspect your sources and figure them out.
Correct. Unfortunately we haven't found a tool capable of doing this inspection / analysis automatically - at least not for C++ / Visual Studio projects. Thus we have little choice other than manually coding / recoding the dependencies in Jenkins. It is annoying and fragile, I know, but in the absence of any other reasonable alternative we have no other choice.
The problem I have is that, even when I want to do this manual configuration myself I currently need to make use of at least a dozen different plugins to accomplish the task, and since each plugin is developed by different individuals they vary in quality and compatibility. On more than one occasion I've had to take extensive measures to work around compatibility issues between plugins where some don't work quite properly with others.
My hope is that if this basic job-dependency management logic were incorporated into the core Jenkins architecture the results of these efforts would be more fruitful.
NOTE: If it would help move this initiative forward, I could provide some very specific examples of the kinds of problems / behaviors we'd be looking to avoid / achieve so as to help clarify the expectations of this (self-admittedly) vague improvement request.
To get the ball rolling, here is an example of some use cases that are difficult if not impossible to achieve at the moment which I would hope would be solved by this improvement:
Suppose a project has a low level 'framework' module and a higher level 'application' module which depends on it. It stands to reason that after building the 'framework' module successfully we should run a build of the 'application' module automatically so as to incorporate those framework changes. In this situation the latter, which we'll refer to as "Job A" depends on the latter which we'll refer to as "Job F". In this situation the expected behavior would be as follows:
In this scenario, the first 2 bullet points are more-or-less built into the Jenkins core via 'triggers', with the exception of item 2.2. So far as I can tell there is no way to prevent dependent projects from building when their upstreams are broken - via plugins or otherwise. Items 3 and 4 are supported by the Jenkins core but for some reason they are disabled by default. IMO if Jenkins implemented correct dependency management these options would be enabled by default. Item 5 is achievable with a fair amount of work through the use of plugins and an assortment of 'tweaks' to the Jenkins configuration.
Now, lets extend our example to include a server module, which we'll refer to as "Job S". This module will again depend on Job F - the common framework - but will be completely independent to Job A - the application. So in this case this new job would need to exhibit the same behavior as described for "Job A" above, which has the following implications:
So again, support for the first 3 items is supported by Jenkins core. Item 4 is partially support, with the exception that there is no mechanism that prevents Job A and Job S from building through independent triggers when Job F has finished building unsuccessfully. Items 5 and 6 are inherently supported by the Jenkins core. Finally, item 7 is supported via plugins with some monkeying with the configurations and triggers a bit.
Finally, lets extend our example to include a dreaded diamond dependency. Suppose now we add a forth job to our configuration which 'packages' the artifacts of all three jobs (ie: creates an installer for them). Let's call this Job I. In this situation Job I requires all other jobs to be built successfully for it to work correctly. This has the following implications:
This scenario is where things start to break down horribly for Jenkins. For example, we have the Join Plugin out our disposal, which looks like at a glance may be designed for just such a scenario. However it breaks down in several situations like items 3.1, 4 and 5 above (if memory serves). Then we have plugins like Build Flow which, again, appear to handle several of these cases correctly. If memory serves it does handle item 3.1 correctly, for example. However it too falls apart on items 4 and 5. Then, the limitations of these plugins gets exacerbated even further because many of them don't play nice with other plugins that are required to get this whole use case working. For example, neither of these aforementioned plugins work with the Priority Queue Sorter plugin, which is also need to achieve the results described above. So then you end up having to trade off one bit of functionality for another (ie: if plugin A and B won't work correctly together then you have to make a choice of which plugin is more valuable to your workflow and ditch the other one).
It is my hope that if job dependencies were modeled in the core Jenkins architecture that not only would many of these plugins become obsolete, but that the resulting behavior of the tool as a whole would become more robust as a result. Put another way, trying to "extend" Jenkins to "add" support for dependency management seems like trying to "extend" a motorcycle to be more like a transport truck by "adding" a few wheels to its axles. At best you end up with a Frankenstein like thing that doesn't quite satisfy the users expectations.
even when I want to do this manual configuration myself I currently need to make use of at least a dozen different plugins to accomplish the task
Not when using Workflow. One job with one Groovy script which can work however you like, with no additional plugins.
In the case of your diamond dependency scenario, I think all that is really missing today is the aforementioned operator to skip a stage when there are no SCM changes in that area. If you do have dozens of dependencies, it would be convenient to have a library to implement this model based on a simple configuration DSL.
I do not think anything like this will or should become part of “basic Jenkins infrastructure”. The dependency & triggering logic built into Jenkins core is already far too complex. That is why we created Workflow—the existing system did not scale up to more sophisticated scenarios.
I just wanted to make one further clarification in response to Jesse's comment above. Correct me if I'm mistaken, but I think in your comment you may be confusing the concept of "code dependencies" or "build dependencies" with "operational dependencies". What I mean to say is that, while it is true that as developers we probably tend to model Jenkins' job dependencies on our code modules' dependencies, there is not always a 1:1 relationship in that mapping. Often times we'll need to have operations executed as part of the automation process that have nothing to do with compilation but are still required by business processes.
For example, if the code for Module A depends on the code of Module B and we use two separate Jenkins jobs to compile each of these then it's pretty clear what the dependencies must be (ie: this is where make and other such tools come into play). However, suppose we have 3 "phases" of a build process: compile, test, package, each of which managed by a separate job. Who is to say how those three jobs should relate / depend on one another? Should packages be dependent on tests? That depends on the context and the policies that govern your release process. This, imo, is where tools like Jenkins really need to shine. Sure the code dependencies need to be taken into account, and granted most of my earlier examples tend to favor such examples, but they are by no means the only source of dependencies your system needs to model.
Consequently, I don't think you can ever get away from having to manually model your release process in the tool of your choice, Jenkins or otherwise. At best you can extract parts of that model from tools and build scripts, but you'll never quite get everything you need from there - at least not when you work at scale.
Not when using Workflow. One job with one Groovy script which can work however you like, with no additional plugins.
This does sound promising. From what I understand this plugin isn't yet available on the Jenkins plugin 'store', correct? Do you have any thoughts as to when it will be "production ready"? I definitely would like to give it a try when it is deemed "stable".
The biggest issue I'd have with it would be having to take the time to learn Groovy and the plugin and then to write a script to handle some of these seemingly trivial use cases, but it's one of those things where if there are no other options at our disposal then we may have no other choice.
The other thing we'd need to do is test the plugin in-house to make sure that adopting a new plugin such as this wouldn't have an adverse effect on the dozens of other plugins we're currently using. As I mentioned before, we have experienced numerous inter-relationship problems between plugins in the past.
I do not think anything like this will or should become part of “basic Jenkins infrastructure”.
Obviously I'm not familiar with the internals of the tool itself, nor am I a maintainer or even an active contributor to the project (yet), so obviously I can't speak to whether such features will be incorporated into the core. However, given the importance and benefits of supporting correct dependency management I think it is pretty clear that these features should be incorporated into the core. The architecture would need to model the concepts of dependencies from the bottom up in order for it to be robustly supported - across plugins, across configurations, etc.
The dependency & triggering logic built into Jenkins core is already far too complex. That is why we created Workflow—the existing system did not scale up to more sophisticated scenarios.
I suspected this was the case. It sounds like some of that code needs to be refactored to compensate for the added complexity.
I probably should say that I do understand that what I am proposing here would likely be invasive and would require a lot of work, but as a result I believe that doing so would be a game changer for Jenkins which would further encourage it's adoption in the corporate world where such things are of critical importance. For example, if the core architecture supported dependency management and these features were exposed on the Jenkins UI via easy to understand interfaces then even non-developers can get involved with the automated process management. Exposing this feature via a scripting environment, while very flexible and powerful I'm sure, does preclude / discourage non-developers from using it.
I don't think you can ever get away from having to manually model your release process in the tool of your choice, Jenkins or otherwise. At best you can extract parts of that model from tools and build scripts, but you'll never quite get everything you need from there
Agreed, and I was not suggesting otherwise. Just saying that there are cases where you have a large number of modules with a completely consistent, homogeneous model—each has a static dependency list on other modules, and each accepts a predefined “build & test” command which is considered a prerequisite for building downstream modules. For this scenario, it is helpful to have some kind of tool which either automatically scans for dependencies, or accepts a DSL with a manually managed yet concise description of dependencies, and implements the minimal build sequence (with topological sorting, or automatic parallelization, etc.). For example, if you are using Maven with a reactor build (one big repository with lots of submodules with SNAPSHOT dependencies), and can determine which modules have changes according to the file list in the changelog of the current build, you can pass that module list as --also-make-dependents B,K,P,Q --threads 2.0C and get an easy parallelized, minimal build.
There are of course other scenarios where every dependency is idiosyncratic enough that you have to model the whole behavior from scratch. And there is often some kind of setup stage and/or final deployment stage that falls outside a fixed dependency model. Neither poses any problem for Workflow.
From what I understand this plugin isn't yet available on the Jenkins plugin 'store', correct?
There are beta releases available on the experimental update center. Please see its project page for details, or use jenkinsci-dev for questions.
Do you have any thoughts as to when it will be "production ready"?
1.0 is expected very soon, for what that’s worth. I cannot promise this would solve all of your requirements (even if and when a changelog operator is implemented), but it is the only plausible candidate, which is why Kohsuke & I have been spending so much time on it. Your use cases are exactly in line with what we envisioned as the interesting problems to be solved—the things that just could not be done as separate jobs with triggers without losing your mind.
Thanks for clarifying.
Ironically, I have been reading a lot about Maven lately since our company does have a small Java development team that uses it, and I'm trying to evaluate whether any of those tools may be usable by our native development teams. So far it's not looking good. I do have to say that on some level I am, as a mainly native C++ developer, jealous at the tools available to Java developers, most notably Maven. They do provide a lot of features that are sadly missing or, at best, very difficult to find in native toolsets.
the things that just could not be done as separate jobs with triggers without losing your mind.
Very well put! Unfortunately I think I lost my mind on this stuff at least a year ago or more.
Kevin, have you looked at Gradle for your C++ dependency management?
I have heard of Gradle but I have never given it much attention. Since you mentioned it I looked into it more and it does look promising. I will definitely give it a test run to see how well it holds up. I'm not convinced just now that it would preclude the need for Jenkins to support robust dependency management as well, but perhaps it could help bridge the gap at least.
Thanks for the suggestion.
jwal:
I have a graph like this on my dashboard:
Is is based on each job having three types of dependency:
If any of these change then the job is considered to be pending and scheduled (in dependency order) to run. I also re-run any job that is:
The only time I configure Jenkins to trigger a job is if I want it to run on a timer such as hourly or daily.
3 years later, Pipeline seems to be the future of Jenkins. Is there any update on this issue? Most use cases brung up by leedega seem as relevant as ever in the Pipeline world, and I don't see an easy solution, besides building our own dependency resolver, which feels like rebuilding the wheel.
[Originally related to: JENKINS-29913]
[Originally related to: JENKINS-19727]
It seems the basic architecture of Jenkins is such that jobs are individual units of work, meant to be related in very trivial ways with other jobs for the mere purpose of synchronizing the execution of related jobs.
What seems to be sorely lacking is a robust dependency management scheme that allows jobs to be treated as actual inter-related entities with specific functional and behavioral requirements. Because of this limitation there are numerous plugins and extensions that attempt to workaround this issue, such as the Join, Conditional Build Step, Build Blocker and Locks and Laches plugins.
Even the "Advanced" job options for "blocking" jobs when upstream and downstream jobs are running is further indication of this lack of dependency management. If JobA depends on JobB and thus triggers the downstream job upon completion, I can't imagine ever wanting the two to run at the same time - ever. The fact that this behavior is optional is quite illuminating.
This limitation gets even more prevalent when you have large, complex job sequences to orchestrate, with non-linear interdependencies between them. There are countless questions on forums and sites discussing workarounds, often leveraging the features of several related plugins hooked together to "partially" solve these dependency issues, when it seems the problem would be best solved in the Jenkins core functionality.
The one underlying issue that cross-sects all of these topics, and affects nearly all plugins that I've tried which help work around this limitation, is the problem that jobs that are inter-related in different ways are expected to be independent from one another by default, rather than making the dependency enforcement mandatory.
Take for example the Join plugin. It provides a very basic ability to define non-linear relationships between jobs, allowing a diamond-pattern relationship between them. So JobA can trigger jobs B and C, and then once these two jobs complete successfully Job D gets triggered. Sounds fine and dandy until you realize that you can quite easily trigger job B to run and, once complete, it will happily trigger Job D even if Job A and C are broken. Similarly, even if all 4 jobs have the "block" when upstream and downstream jobs "advanced" options set, JobD can still be executed in parallel with Jobs B and C.
Now, some may say that these bugs are probably the not with the Jenkins core but rather with these plugins, and at first glance I would tend to agree. However these limitations are so common and pervasive across nearly all job-management related plugins I have tried that it is hard to deny there is some core feature missing from this tool.
Maybe there is some magic bullet that helps resolve these issues that I'm missing but I have been administering a Jenkins build farm with 10 PCs and nearly 500 jobs for several months now, and I've tried dozens if not hundreds of plugins to try and orchestrate non-trivial dependency management between jobs which, at best, results in a complex sequencing of many such plugins, and at worst has met with utter failure.
thoughts
Perhaps an easy solution would be to provide some kind of a "global" option in Manage Jenkins section that forces all jobs that trigger other jobs to act as if they are actual dependencies of one another rather than just dumb triggers. Then upstream jobs that are running or failing would prevent downstream jobs from running, even when these dependencies follow a complex, non-linear relationship and regardless of which plugins are used to orchestrate these relationships.
Alternatively, maybe what we need is a new job type, call it "component job" or something. When instantiated it would have options that allow complex dependency management between jobs to be handled automatically.
Whatever the solution, I strongly feel that this is a very important feature that is badly needed in Jenkins and would help make the tool much more practical for large scale CI needs.
Originally reported by leedega, imported from: Much needed dependency management between jobs