timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-9104] Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed #6408

Closed timja closed 6 years ago

timja commented 13 years ago

I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].

I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.

Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.


[1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
[2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/


Originally reported by gordin, imported from: Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed
  • assignee: danielweber
  • status: Resolved
  • priority: Major
  • resolution: Fixed
  • resolved: 2018-05-07T15:04:36+00:00
  • imported: 2022/01/10
timja commented 8 years ago

josch:

As a workaround I have created a Jenkins Job that executes a Windows batch command on the jenkins node where Visual Studio is installed.
The jenkins job triggers the batch command once a day and works in my environment for several years now.
The batch command looks like this:

set MSPDBSRV_EXE=mspdbsrv.exe
set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

set PATH=%MSPDBSRV_PATH%;%PATH%
set ORIG_BUILD_ID=%BUILD_ID%
set BUILD_ID=DoNotKillMe

echo stop mspdbsrv.exe
%MSPDBSRV_EXE% -stop

echo wait 7 sec
%windir%\system32\ping.exe -n 7 localhost> nul

echo restart mspdbsrv.exe with a shutdowntime of 25 hours
start /b %MSPDBSRV_EXE% -start -spawn -shutdowntime 90000

set BUILD_ID=%ORIG_BUILD_ID%
set ORIG_BUILD_ID=
exit 0

What the batch command does is:
stop the mspdbsrv.exe to free up resources
start mspdbsrv.exe with BUILD_ID=DoNotKillMe and a shutdowntime of 25 hours, that leaks the mspdbsrv process without getting killed and it runs for 25 hours so that other build jobs can use the already running process

What you maybe have to do is to change the Path to mspdbsrv -> set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

timja commented 8 years ago

mbrock:

Updating the msbuild plugin won't work in our situation. We run into this issue, but we don't have the plugin installed. Rather the issue comes for us in the Final Builder scripts we run via Jenkins that call msbuild.

timja commented 8 years ago

danielbeck:

Then install it. MSBuild will veto all mspdbsrv killing.

timja commented 7 years ago

mwinter69:

set the environment variable
_MSPDBSRV_ENDPOINT_=$JENKINS_COOKIE
(The variable starts and ends with a single '_')
This will lead to separate instance of mspdbsrv being started.

timja commented 7 years ago

grillba:

mwinter69, thanks for the pointer.

We couldn't get it working with $JENKINS_COOKIE but managed to correct it by adding the following property via EnvInject prior to kicking off the build

_MSPDBSRV_ENDPOINT_=$BUILD_TAG

This resulted in a separate process being initiated for each build and no conflicts/error.

Edit: Correction due to formatting. Refer below

timja commented 7 years ago

hidminds:

It is

_MSPDBSRV_ENDPOINT_

(with underlines) not MSPDBSRV_ENDPOINT.

Just realized it myself that it's a formatting issue. If you enclose the word in underlines it will get italicised and the underlines disappear.

timja commented 7 years ago

grillba:

Apologies, yes an underscore at each end.

timja commented 7 years ago

andne:

We recently re-encountered this on our build network and I did some investigation, here's what I found:

It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this.

Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?

timja commented 7 years ago

andne:

I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners

I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.

timja commented 7 years ago

walteste:

Is there any workaround to this issue, because it completely breaks our usage of Jenkins?

timja commented 7 years ago

grillba:

Hi Stefan, refer my comments above. This fixed it for us. Cheers

timja commented 7 years ago

walteste:

Hi grillba, thanks a lot for your suggestion. It seems that this solved our issues.

timja commented 7 years ago

ext3h:

Little side note: It might not be sufficient to just specify _MSPDBSRV_ENDPOINT_ env variable in order to avoid conflicts. I recommend to additionally also set TMP , TEMP and TEMPDIR to an isolated folder if you plan on invoking MSBUILD in parallel as various plugins for MSBUILD as well as MSBUILD itself will place files there.

Further catch of using _MSPDBSRV_ENDPOINT_ is, that now serialization of parallel builds in the same working directory will break in return, unless you made sure that the tempoary files for the different architectures (e.g. the temporary program database created with the individual object files, and commonly named just e.g. "Debug\vc120.pdb", notice the lack of a prefix for the architecture) are completely isolated as well. Otherwise the different mspdbsrv-instances will now collide accessing the same file.

timja commented 7 years ago

billhoo:

grillba, walteste Hi there, we've got this issue too, and we followed your suggestions to config the master Jenkins node like this:

Configure system > Environment variables > Add new key value pair below:

 

KEY: _MSPDBSRV_ENDPOINT_

VALUE: $BUILD_TAG

 

But we got nothing, the error still raised up on windows slave, could you please explain the solution in detail? Should we set this Key-Value on the slave node? Thanks in advance

timja commented 7 years ago

grillba:

@billhoo,

You need to do it at the Job level - Not the system level. Use envinject to add the environment variable

Have a look here for how to use envinject,  https://wiki.jenkins.io/display/JENKINS/EnvInject+Plugin

Make sure you follow the "Inject variables as a build step" topic

Regards

Mark

 

 

 

 

timja commented 7 years ago

billhoo:

grillba,

Thanks for the timely reply, we've followed your guide and found that there were already 3 seprated mspdbsvr.exe processes(for test purpose, we've ran 3 jobs on one windows slave concurrently) ran in background, so it seems worked, but unfortunately, one of our job still failed due to C1090 error.

 

This is the screenshot of EnvInject in each of our 3 Pipeline jobs configuration page,

I don't think there's anything wrong here, do I miss something?

 

Thanks,

Bill.

timja commented 7 years ago

adam1book:

Just in case this helps anyone, I was able to fix all problems mentioned so far in this issue and comments by following the recommendations on this blog post:
http://blog.peter-b.co.uk/2017/02/stop-mspdbsrv-from-breaking-ci-build.html

The solution involves
1. Installing the MSBuild plugin ver. 1.26 or higher in Jenkins. Setup for use on the server is optional, only needs to be installed. This stops Jenkins from killing the mspdbsrv process automatically.

2. Using the _MSPDBSRV_ENDPOINT_ environment variable as done in the comment above.

3. Spawning and killing a new specific mspdbsrv instance of the right Visual Studio version at the beginning and end of each job which uses it.

Powershell implementation of the Python solution in the blog (change VS140COMNTOOLS to the version of Visual Studio being used):

# Manually start mspdbsrv so a parallel job's instance isn't used, works because _MSPDBSRV_ENDPOINT_ is set to a unique value
# (otherwise results in "Fatal error C1090: PDB API call failed, error code '23'" when one of the builds completes).
$mspdbsrv_proc = Start-Process -FilePath "${env:VS140COMNTOOLS}\..\IDE\mspdbsrv.exe" -ArgumentList ('-start','-shutdowntime','-1') -passthru

.\{PowershellBuildScriptName}.ps1

# Manually kill mspdbsrv once the build completes using the previously saved process id
Stop-Process $mspdbsrv_proc.Id

 

timja commented 6 years ago

jakuborava:

I had the same problem with parallel builds (eg. running in parallel job A from trunk and job A from branch), I tried the solution with _MSPDBSRV_ENDPOINT_ with value BUILD_TAG and it worked almost for all jobs. In one situation I still had that error. So I replaced BUILD_TAG with JOB_NAME environment variable and suddenly it was fine, for now we are out of problems. If anyone has still the problem with ENDPOINT solution, try to change BUILD_TAG for something else. If you do not allow parallel build in single job, JOB_NAME should be enough, otherwise you can try JOB_NAME + BUILD_NUMBER combination.

Maybe ENDPOINT has some restrictions, but I did not have a time to inspect this deeper. What I know is that the problematic job has the longest name in my Jenkins - approx. 48 characters.

timja commented 6 years ago

davida2009:

Please can anyone advise me how to set _MSPDBSRV_ENDPOINT_ with value BUILD_TAG in a pipeline declarative script?

I don’t really understand the difference between defining and injecting an environment variable. I could do:

stage('build_VisualStudio') {
environment { _MSPDBSRV_ENDPOINT_=$BUILD_TAG }
etc.

Would that be sufficient or must environment variable injection be done in a different way?

timja commented 6 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Beck
Path:
content/_data/changelogs/weekly.yml
http://jenkins-ci.org/commit/jenkins.io/0391fcb9b4c957e9e41fde03409de330a3de571d
Log:
Remove JENKINS-9104 fix from release to unblock it

timja commented 6 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Beck
Path:
content/_data/changelogs/weekly.yml
http://jenkins-ci.org/commit/jenkins.io/62409d42a5769cac66337cbd4b5df5754f0e2384
Log:
Merge pull request #1522 from daniel-beck/changelog-2.119-amended

Remove JENKINS-9104 fix from release to unblock it

Compare: https://github.com/jenkins-infra/jenkins.io/compare/58f029c79331...62409d42a576

timja commented 6 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/util/ProcessTree.java
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/3465da4764c322baf4fb5b90651ef6b9bcd409fb
Log:
Merge pull request #3419 from dwnusbaum/JENKINS-9104-test-fix

Fix test failure by cleaning up static state after tests

Compare: https://github.com/jenkinsci/jenkins/compare/ddbc4bbce7d3...3465da4764c3
*NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

Functionality will be removed from GitHub.com on January 31st, 2019.

timja commented 6 years ago

danielbeck:

Jenkins 2.120 contains a fix for the previous problem of the ProcessKillingVeto extension point not working on agents.

timja commented 5 years ago

vuiletgiraffe:

I'm occasionally getting this error with the latest versions of Jenkins and all the plugins. It started in the recent months, haven't been a problem for a year before that. The problem seems to have NOT been resolved, or possibly re-emerged.

What can I do, is there a workaround? Sporadic build failures for no reason are super annoying.

timja commented 5 years ago

billhoo:

Same error with latest Jenkins ver. 2.150.3

The error is aways occured when running two jobs concurrently on the same agent with VS2015:
fatal error C1090: PDB API

timja commented 5 years ago

vuiletgiraffe:

billhoo, thanks for the tip! I was running VS 2017 (v141 toolset), but there were indeed two simultaneous jobs! So the workaround is to limit this agent to one job at a time. Pity, as it's a pretty powerful multicore server, but it's better than flaky builds.

timja commented 5 years ago

billhoo:

vuiletgiraffe, totaly the same, we have many different jobs which use MSVC14 as toolchain, but now we can only perform one build at a time, its a huge waste of mashine resources ;(

Hope it can be truly solved.

timja commented 5 years ago

ext3h:

Solution is still the same, before invoking `msbuild`, set the following environment variables to something unique:

_MSPDBSRV_ENDPOINT_=
TMP=
TEMP=$TMP
TMPDIR=$TMP

Once you have done that, you can launch as many parallel MSBuild instances as you like, even mixing different msbuild versions or whatever. They will not interfere in any way. Doing that on a regular base with mixed MSVC12, MSVC14 and MSVC15 toolchains on the same machine, and didn't have any issues since.

The "official" fix for this problem (trying not to kill the job scheduler) is plain wrong, and causes massive issues. Mostly because MSBuild itself isn't exactly stable either when using the same job server for multiple parallel builds. And if the builds are using different toolchains, a crash is ensured.

timja commented 4 years ago

rompic:

I used ext3h's solution:

https://issues.jenkins-ci.org/browse/JENKINS-9104?focusedCommentId=360603&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-360603

we solved it like this in a jenkins github multi-branch setup with jenkinsfiles:

bat """
    mkdir tmp
    set _MSPDBSRV_ENDPOINT_= ${BUILD_TAG}
    set TMP=${Workspace}\\tmp
    set TEMP=${Workspace}\\tmp
    set TMPDIR=${Workspace}\\tmp
    build.bat
""" 

 

timja commented 2 years ago

[Originally duplicated by: JENKINS-24753]

timja commented 2 years ago

[Originally related to: JENKINS-3105]