timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-9104] Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed #6408

Closed timja closed 6 years ago

timja commented 13 years ago

I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].

I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.

Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.


[1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
[2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/


Originally reported by gordin, imported from: Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed
  • assignee: danielweber
  • status: Resolved
  • priority: Major
  • resolution: Fixed
  • resolved: 2018-05-07T15:04:36+00:00
  • imported: 2022/01/10
timja commented 9 years ago

kerrhome:

How are you trying to run this, Del? At first, I didn't have success getting it going, but now I seem to have it working fine. The BUILD_ID does seem to be an effective solution (I do worry about the memory leak though). I'm using the simple batch solution in comment 6, not the python solution. You just have to make sure that the mspdbsrv file is in your path and it should work fine. We use a batch wrapper, which is under version control, for our builds and I added code that says "If this is a Jenkins build, execute this block". To decide if this if a Jenkins build, I just check to see if JENKINS_URL is defined. Since I added that, we've not seen this issue return. Let me know if I can help in some way.

timja commented 9 years ago

delboyjay:

I've added the block from above into the Jenkins command for the job at the moment but yesterday I got this error and there was only one build running so it is likely a different issue.

33>X509Helper.h(118): fatal error C1090: PDB API call failed, error code '23' : '(

I've even tried setting BUILD_ID=dontKillMe under the node configuration in Environment variables but I have been getting the original problem with that setting also. I even tried restarting the jenkins client service on the build server just in case it was needed for the env variable to be set for all child processes but it's not helping it seems. If this is working for yourself (@Shannon) I have to be doing something stupid.

It seems that putting BUILD_ID under the node settings will be overridden when the build runds and will set BUILD_ID back to the build time. Which rules out having a global setting allowing me to turn this off.

timja commented 9 years ago

leedega:

One thing I felt needed to be expressed here is that the fact that this defect arose in an update to the LTS edition at all worries me. Combined with the fact that this defect has been opened and under active discussion for months now without any 'real' resolution - other than some hacks and workarounds - is even more concerning. According to the Jenkins website LTS editions should "...change(s) less often and only for important bug fixes...". This policy seems to have been completely negated here. Given the severity / impact of this change I would have expected whatever "improvement" was made that caused this problem would have been reserved for the "latest" release, or at the very least reverted from the LTS edition after this problem was discovered.

Perhaps someone with more knowledge about the cause of this error could elaborate on why neither of these approaches has been taken here.

timja commented 9 years ago

kerrhome:

@Del, Yes, you cannot set BUILD_ID for a slave setting. It is set by Jenkins on a per build basis. You'd either have to set it in the batch section of the job itself (we did this for our most frequently used builds) or if you call a batch script or some other script, you can put it there.

timja commented 9 years ago

ki82:

200$ is up for grabs for solving this issue at: https://freedomsponsors.org/issue/596/visual-studio-builds-started-by-jenkins-fail-with-fatal-error-c1090-because-mspdbsrvexe-gets-killed

timja commented 9 years ago

danielweber:

I implemented a whitelist solution, see pull request: https://github.com/jenkinsci/jenkins/pull/1562

timja commented 9 years ago

sweavo:

Nice work Daniel. Will be interesting to see whether that solves the problem.

For the good of the thread, I'm going to try to summarize this from the top down as there's a lot of talk on here that seems to miss the key points.

1) BUILD_ID is an environment variable, set by Jenkins when it starts a job.

2) Environment variables are inherited when processes start other processes, except when overwritten. For e.g. in bash scripts you can go

MYVAR=myvalue myscript.sh

and myscript.sh will run with MYVAR set to myvalue.

3) Therefore, all processes started by a jenkins job have the same BUILD_ID. This is recursive.

4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them.

This is correct and good behavior by Jenkins.

5) When you start an MSBUILD job, pdbsrv is started, which catches requests from parallel compilations and serializes them to write pdb files. When started from Jenkins, that pbdsrv process inherits BUILD_ID from the job.

6) If you run two MSBUILD builds at once, then they share the same pdbsrv process.

7) When the first job ends, it kills the pdbsrv process – because its BUILD_ID matches the first job's build id. The second job then fails.

8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job.

9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job.

Casual readers stop here.
=========================

10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away.

11) My "heavyweight" python fix is trying to deal with that. Basically wrapping pdbsrv with a proper timeout and reference counting so that pdbsrv is present exactly when needed.

12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv.

13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv.

I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution. The use of BUILD_ID brings the Jenkins machine under better control against rogue processes, and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist.

14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build.

=========================

Solution 0: Finally, it would be remiss of me not to mention again my python workaround, which has been happily keeping parallel builds working for 54 weeks now without trouble.

timja commented 9 years ago

sweavo:

penny drops just seen how whitelisting differs from BUILD_ID solution subtle, but it might just work...

timja commented 9 years ago

leedega:

Just a quick ping-back on this issue. Outstanding for like 4 years, no comments for months now, and all for a debilitating, crippling problem in the system! I did notice the pull request Daniel Webber created, which does seem to have some more recent activity on it but still no complete resolution to the issue even in the latest LTS release.

Are there plans for finishing this work any time soon? We are still stuck on an LTS version from like a year or two ago because we can not accept this bug into our production environment. If there is any way to get this fix in sooner rather than later I know I'd appreciate it and I'm sure many others would as well.

timja commented 9 years ago

leedega:

@steve carter
First, let me thank you for summarizing the earlier comment threads. That does help bring everything into focus.

4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them. This is correct and good behavior by Jenkins.

Agreed. This is a perfectly valid and useful enhancement for the majority of cases. However, given the debilitating effect it has on this specific use case combined with the fact that the change was included on an LTS release which is expected to be kept as stable as possible is where I take issue. I see this problem as a bug, albeit a difficult to detect bug and admittedly a bug that is really caused by some questionable behavior provided by the Microsoft build tools, but a bug none the less. In that case critical, production halt kind of bugs like this should be fixed immediately or reverted until an appropriate fix can be made. Doing otherwise reduces users' confidence in the stability of the tool. There is a reason shops like ours choose to use LTS editions for production work - to avoid problems like this that may be found on the latest, cutting edge versions.

8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job.

This should be called a workaround or hack rather than a solution. That point aside, this workaround again won't work for our particular build environment. We use the BUILD_ID throughout our build processes to embed metadata in the binary files we generate. If we reset that environment variable as part of our build this metadata will essentially get corrupted. Changing our tooling to use an alternative environment variable would require significant effort as well, having to be propagated out to dozens of products across several release branches each.

9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job.

Based on my review of his pull request, Daniel's feature has not yet been completed nor has it been included in any actual LTS release. I do believe this would be a reasonable and appropriate solution to this defect though, so hopefully this work can be completed sooner rather than later.

10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away.

I know some earlier posters did indicate that this was an issue for them I have not been able to reproduce the problem as described. When a compile begins and this process is running it makes use of the existing process, and if the process is not already running it starts it. I have never had a compile running and seen the mspdbsrv process terminate mid-compile without any other background process or system event occurring. Also, I work with many development teams including many dozens of developers and have never once had a report of this bug outside of the reproducible use cases I've stated before.

Conversely, I have shown the problem is reproducible outside of Jenkins in very hard to detect ways which I suspect may appear to some to be an intermittent timeout. For example, if you are logged in to a system which is performing a compile in a background process which is also running under the same user profile as your local session, by simply logging out of the system the service terminates. The reason for this is the pdbsrv process is shared by the background process and your local user session and when you log out from the local session all processes in that memory space are terminated, including pdbsrv. This was a very difficult use case to isolate and not very obvious to users of the target systems and even went undiagnosed at my place of work for months under the assumption that the failure was unpredictable and intermittent.

I know that my argument doesn't prove that this particular problem couldn't ever happen but I am extremely skeptical to say the least. If someone does believe that this problem does in fact exist I would greatly appreciate a detailed description on how to reproduce the problem. Maybe we're using a slightly older or slightly newer version of the compiler that doesn't exhibit the problem or something. Either way, if these individuals were willing to compare notes maybe we can help further isolate the root of this discrepancy.

12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv.

As I've stated in earlier posts, my team manages a build farm with close to a dozen agents now, running over 1000 build jobs and never once have I ever had this error occur on any of those systems, nor have any of the development teams we support report this problem on any of their local development machines. I would have to say that if this were in fact a core issue with the Microsoft toolset we would have discovered it by now. Again, if anyone can give me a reproducible use case that proves otherwise I would be happy to hear from them. Maybe we are doing something they aren't, or vice versa.

13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv.

Again, this is something we have not been able to reproduce. For example, I have watches some of our agents that are under the most considerable load wrt build operations - machines which essentially run 24/7 compiling one or more projects in parallel nearly all the time and these systems continue to run stably day after day, week after week without requiring any outside intervention from me or my team. The pdbsrv process is nearly always active, the memory consumption increases and decreases with the load on the machines, and never causes any fatal errors in our build processes.

If anyone can provide specific, reproducible criteria for this problem I would be interested to hear it. If there is something we have overlooked that may be causing us grief elsewhere that we have not yet considered I would definitely want to know about it.

I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution.

Agreed. I don't think 'just' rolling back this change is the best solution. I think fixing this bug is the best solution. However in the absence of an appropriate fix for this bug, combined with the severity of it's impact, I think that rolling back the change until an appropriate fix was put in place would have been a better solution rather than stranding users of your tool on an old, out of date release as we have been.

Just my 2 cents.

The use of BUILD_ID brings the Jenkins machine under better control against rogue processes...

Totally agree that the improvement is well worth the effort. My concern is that the change includes a relatively significant bug.

...and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist.

Again, 'easy' workaround is a relative term. As just mentioned we would need to rework our build tools and roll that change out to many teams for many products, and backport those changes to many branches for this to work, after which we'd need to going through all 1000+ jobs on our farm and update them with the hack to the environment variable. Obviously significant effort in our case. Also the whitelist solution has yet to be completed from what I can tell, so that is not a usable solution yet.

14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build.

Again, just to be clear this is clearly a workaround and not a solution.

This hack may work for us in the interim until an appropriate fix can be made. I will test it out as soon as I can and report back. In our case we'll likely just setup a scheduled task that runs on boot and forces the service to start, and stay running indefinitely as there is no need for it to shut down ever that we have seen.

However, for those individuals who claim that the service does need periodic resetting a solution like this would likely be more complex. Assuming they to need to ensure the utmost stability of their build farm as we do, they would need to ensure the pdbsrv service gets started before any compilation operation runs, including after reboots, power outages, crashes and the like. I don't believe there is any way to achieve this using a Jenkins operation. This means an external process would be needed like the Scheduled Task idea I just mentioned. But then the external process would be running independently from the Jenkins agent making it even more difficult to coordinate the two. For example, I suspect it would be difficult at best to make sure the scheduled task restarts the service at an opportune moment when no compilation operations are happening on the agent. Just something else for those users to keep in mind.

timja commented 9 years ago

leedega:

PS: Sorry for the rant. My team and I have been aggravated for some time now, hoping this bug would be fixed so we can move off the old version of Jenkins we're currently stuck on and thus able to pick up some new bug fixes both in the core as well as in numerous plugins which only support newer versions. Hopefully I don't come across as overly adversarial.

timja commented 9 years ago

laro:

Maybe there is a way to shut down the mspdbsrv.exe softly, so it stops only after all active request (by parallel builds) are done. Then it should simply restart on the next request.

Another solution would be to allow the user to give a list of process names not to kill (or maybe hardcode not to kill mspdbsrv.exe).

timja commented 9 years ago

s7726:

Stopping after a timeout period after all active requests and continuing to run when it gets a new request are the way mspdbsrv runs normally when something doesn't go around killing it (ala Jenkins).

I believe the correct solution is a whitelist.

timja commented 9 years ago

leedega:

Update
So, it turns out setting up some kind of background process to spawn a copy of the pdbsrv process isn't going to work as expected. From what I can tell Windows seems to be able to tell when a process has been launched from a system service and it will prevent those sub-processes from using other processes that are spawned elsewhere. The particulars of my test case are as follows:

  1. Setup a small Python script that launches a copy of mspdbsrv.exe when called
  2. Setup a scheduled task in Windows to run the python script on boot
  3. Reboot the agent - confirm the mspdbsrv.exe process is running
  4. trigger a compilation operation via the Jenkins dashboard
  5. A new, secondary copy of mspdbsrv.exe is spawned to serve the Jenkins agent. This sub-process is then terminated as per usual once the Jenkins build is complete.

I have confirmed that both the service that runs the Jenkins agent and the scheduled task use the same user profile and credentials and that both environments are using the same version of mspdbsrv.exe with the same set of command line parameters (ie: -start -spawn).

Looks like I have to head back to the drawing board.

timja commented 9 years ago

leedega:

Update
As a quick sanity check I decided to throw together a quick ad-hoc test configuration where by I overload the BUILD_ID in the environment for one of my compilation jobs just to see if one of the hacks proposed earlier will potentially work. Unfortunately it looks like this is not a robust solution either. I have confirmed in the trivial case that the solution does work, as in:

  1. Setup a job with a single shell operation as a build step, configured as follows:
    • override the BUILD_ID env var with some arbitrary value
    • call into MSBuild to perform the compilation
  2. run a build of the given job
  3. upon completion, confirm that the mspdbsrv.exe process is still running - TEST SUCCESSFUL

However, unfortunately I've found another case where this solution doesn't work. Apparently if you manually kill the build while it is running Jenkins still somehow manages to locate the orphaned pdbsrv process and kill it, despite the changes described above. So, to put it more clearly:

  1. Setup a job with a single shell operation as a build step, configured as follows:
    • override the BUILD_ID env var with some arbitrary value
    • call into MSBuild to perform the compilation
  2. run a build of the given job
  3. while the compilation operation is running, and you have confirmed the mspdbsrv.exe process has been launched, manually force the running build to terminate (ie: by clicking on the X icon next to the running build on the Jenkins dashboard)
  4. FAILURE - Jenkins still terminates the pdbsrv process

I have confirmed that the pdbsrv process does correctly inherit the overloaded BUILD_ID, so Jenkins is somehow able to locate and terminate the process in this case. I suspect what may be happening in my test env is that at the point at which I manually kill the build Jenkins is still running one or more Visual Studio operations which have a direct link to the mspdbsrv.exe process and thus it detects and kills the thread by recursively transcending the process tree killing all running processes / threads that are tied to the agent at the time.

Either way, this example shows that even this 'hack' of overriding the BUILD_ID is fragile at best. It looks like we may have no choice but to wait for a fix for that 'whitelist' solution before we can consider upgrading our Jenkins instance.

timja commented 9 years ago

leedega:

Update
While reporting the issue in my last comment I had the idea for a slight variation of the configuration described there which does appear to work in both use cases. The main modification that I made was to separate the build operation into two separate build operations:

Theoretically even this solution "could" fall prey to the same problem I described in my previous comment, however the execution time of this initial build step is negligible and is highly unlikely to be exploited in practice (ie: a user would need to hit the kill button on the build at just that small fraction of a second it takes Jenkins to launch mspdbsrv.exe).

I'm not sure how easy this hack will be for us to roll out into production at the scale we need, but just in case others find this tidbit of information helpful I thought I'd provide it here.

timja commented 9 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Weber
Path:
core/src/main/java/hudson/util/ProcessKillingVeto.java
core/src/main/java/hudson/util/ProcessTree.java
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/a220431770cfe716e4f69fd76a4a59bbb27aa045
Log:
JENKINS-9104 Add ProcessKillingVeto extension point

This allows extensions to veto killing of certain processes.

Issue 9104 is not yet solved by this, it is only part of the solution. The
rest should be taken care of in plugins.

timja commented 9 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Beck
Path:
core/src/main/java/hudson/util/ProcessKillingVeto.java
core/src/main/java/hudson/util/ProcessTree.java
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/9a047acd4b5a4e805cee7260f3d091405dc7b930
Log:
Merge pull request #1684 from DanielWeber/JENKINS-9104

JENKINS-9104 Add extension point that allows extensions to veto killing...

Compare: https://github.com/jenkinsci/jenkins/compare/3c785d5af0ad...9a047acd4b5a

timja commented 9 years ago

dogfood:

Integrated in jenkins_main_trunk #4205
JENKINS-9104 Add ProcessKillingVeto extension point (Revision a220431770cfe716e4f69fd76a4a59bbb27aa045)

Result = UNSTABLE
daniel.weber.dev : a220431770cfe716e4f69fd76a4a59bbb27aa045
Files :

timja commented 9 years ago

mifoe:

When you use the commandline switch /Z7 the debug info is stored in the object and no server process is needed. This should also solve the problem.

timja commented 9 years ago

s7726:

How does the /Z7 flag affect performance? My impression is that the point of mspdbsrv.exe is to keep the data around for other builds to use, thus decreasing build times for subsequent builds.

timja commented 9 years ago

mifoe:

It does not affect performance but size of object file. with this option the debug information is stored in each object file instead of one pdb. At linktime, the debug information is written in a PDB file.

timja commented 8 years ago

solstice333:

Just wanted to note that this also occurs on my slave nodes and each slave node only has one executor. So at first glance, since I'm not running concurrent builds on any individual slave node, it seems like this error occurring on my slave nodes doesn't make any sense.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Weber
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/855a84479b64f32ceb30f73433858dfe2efb5e9f
Log:
[FIXED JENKINS-9104] Veto killing mspdbsrv.exe

Making use of the newly introduced ProcessKillingVeto extension point,
we now make sure that mspdbsrv.exe survives process killing during build
cleanup.

This requires a Jenkins version >= 1.625, the new extension point was
added there. I marked the extension as optional, so that the msbuild
plugin should still work with older Jenkins releases.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/48084be76d434195c9e8b2ddc66f1fb5255a78de
Log:
Merge pull request #19 from DanielWeber/master

[FIXED JENKINS-9104] Veto killing mspdbsrv.exe

Compare: https://github.com/jenkinsci/msbuild-plugin/compare/98f71956d897...48084be76d43

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/b9a5b02117e0ee097aaf030ab2574daa3dcd217d
Log:
Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/031a05982b16e42cba5544c4ba9511515941c62f
Log:
Merge pull request #20 from jenkinsci/revert-19-master

Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

Compare: https://github.com/jenkinsci/msbuild-plugin/compare/48084be76d43...031a05982b16

timja commented 8 years ago

damiandixon:

> Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

I'm confused why has the code fix been reverted?

The reason I am looking at this again is that the BUILD_ID work around is no longer working for me.

Neither is the 1.25 msbuild plugin which is meant to have the fix in.

I upgraded from 1.595 to 1.645.

timja commented 8 years ago

danielbeck:

damiandixon https://github.com/jenkinsci/msbuild-plugin/pull/20

timja commented 8 years ago

danielweber:

damiandixon: My changes have been reverted by accident, the msbuild plugin release 1.25 does not contain the change required to fix this issue.
There is a new PR reverting the revert: https://github.com/jenkinsci/msbuild-plugin/pull/21

timja commented 8 years ago

danielweber:

This is still not resolved. We need an update of the msbuild-plugin, see PR https://github.com/jenkinsci/msbuild-plugin/pull/21

timja commented 8 years ago

danielbeck:

danielweber This issue is filed against the core component, and that change has been included a long time ago.

timja commented 8 years ago

akb:

Is there a plan for Visual Studio builds not started by the msbuild-plugin, please?

I'm asking because our job configurations use a "Execute Windows batch command" build step rather than "Build a Visual Studio project or solution using MSBuild" build step (and our batch process is non-trivial).

timja commented 8 years ago

danielbeck:

akb The proposed MSBuild Plugin change only requires the plugin to be installed to be effective (assuming mspdbsrv.exe is what you don't want killed).

timja commented 8 years ago

akb:

That's great - thank you very much for clarifying this, and for your efforts to fix the wider issue - I'm looking forward to having more projects and configurations built automatically in a timely fashion through judicious use of parallelization

timja commented 8 years ago

danielbeck:

akb Forwarding the praise to my (first)namesake danielweber who did all the work

timja commented 8 years ago

danielweber:

danielbeck: Well, the core stuff is done. But from a user's perspective the issue still exists.

How can I get someone to merge the pending PR and create a release of the msbuild plugin?

timja commented 8 years ago

peteboyrocket:

What's happened to this fix? It sounds like its ready to go. How can we get a new release of the plugin?

timja commented 8 years ago

ykamezac:

I tried parallel builds with MSBuild plugin 1.25 on top of Jenkins 1.580.1 but unfortunately I still get this error (fatal error C1090: PDB API call failed, error code '23'). Did I miss something ?

timja commented 8 years ago

ostojan:

When do you publish new version of plugin with fix? It's been month since you released version with(out) fix...

timja commented 8 years ago

jxramos:

I'm in need of a fix for this too, it's consistently failing numerous jobs for me. Is there an old version of Jenkins to revert to that avoids this particular problem? I'm willing to go that route as a workaround.
So far this has been a cause of a pretty bad first impressions for a team I setup a CI build setup for who had never seen Jenkins before.
I'm using VS2010 devenv.exe to build the solution files.

timja commented 8 years ago

olexandr_maltsev:

Hello Jaime,
I found a solution.
I think it is a workaround, but it works for me.
I set for every project the addition String parameter.
Go to the Jenkins Project and set "This build is parameterized", “Name” – “BUILD_ID”, “Default Value” – “DoNotKillMe”.

timja commented 8 years ago

olexandr_maltsev:

timja commented 8 years ago

gl1koz3:

Stumbled upon this issue immediately after trying parallel builds. Been open for 5 years now, so I guess you can simply check for 'mspdbsrv.exe' and leave it alone? Please free us of our pain.

timja commented 8 years ago

zzayats:

Somebody, publish the new version please. Apparently, the fix is already in the source code on GitHub. Can someone else (other than the maintainer) release the new version?

timja commented 8 years ago

teljj001:

FWIW, we implemented a workaround to this issue that doesn't involve wiping out the BUILD_ID variable (as we need to use it). Having a release with the Veto would be better, but this avoids random crashes in the meantime.

Instead of allowing the MSBuild process to start the daemon itself, you cause the daemon to start using an environment that you choose. MSBuild then just uses the instance you started rather than starting its own.

The Powershell we use is as follows. Use the Powershell plugin to run this as a step before the MSBuild plugin step (could be translated to Windows batch too if you like).

# https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller

var originalBuildID = $Env:BUILD_ID
$Env:BUILD_ID = "DoNotKillMe"
try
{
    start mspdbsrv -argumentlist '-start','-spawn' -NoNewWindow
}
catch {}
$Env:BUILD_ID = originalBuildID
timja commented 8 years ago

danielbeck:

msbuild-1.26 should contain the fix. Can we finally resolve this, or is something missing?

timja commented 8 years ago

teljj001:

IMO, as soon as 1.26 is released.

timja commented 8 years ago

danielbeck:

*sigh*

1.26 is tagged in GitHub but no artifacts are uploaded. Looks like a failed release. Sorry about that.

Note that MSBuild Plugin is almost certainly not currently maintained, as Gregory stopped working on his plugins, so if someone here wants to take over (danielweber perhaps?) that should be possible.

timja commented 8 years ago

teljj001:

danielbeck no need to apologise, I appreciate you looking at it.