stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
138 stars 44 forks source link

[CI] Fix intermittent failures during build #1342

Closed serban-nicusor-toptal closed 11 months ago

serban-nicusor-toptal commented 11 months ago

Submission Checklist

Release notes

A failure has been reported for stanc3 master builds in which, sometime the build fails with a permission issue. Since each build runs in its own directory dir("${env.WORKSPACE}/stancjs") that leads me to think it's a user/group permission issue as it only fails when running on jenkins2. Thus a simple fix would be to simply use only jenkins as a build agent.

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

WardBrian commented 11 months ago

Is it not feasible to diagnose the root cause here?

serban-nicusor-toptal commented 11 months ago

We can ask Dylan, what I think is different between the two machines is the GID with proper permissions. On jenkins we use --group-add=987 --group-add=988, if he can point us in the right direction with what GID to use for jenkins2 I can try to figure out a way to dynamically pass that based on the agent used.

WardBrian commented 11 months ago

@dylex - are you able to give us some advice on the above question re: permissions issues on jenkins2?

dylex commented 11 months ago

Those are the groups for docker on both machines -- I'm not sure why it would need numeric groups, or really why it would need this at all, since the user is already in those groups, and the containers shouldn't need them unless you're doing some kind of docker-in-docker. If those files end up being owned by anyone other than the jenkins user, it seems like something is going wrong earlier in the build.

WardBrian commented 11 months ago

containers shouldn't need them unless you're doing some kind of docker-in-docker

I think we might do docker-in-docker in the build phase so we can produce ARM executables? @serban-nicusor-toptal

serban-nicusor-toptal commented 11 months ago

We use docker-in-docker for these steps, see https://github.com/stan-dev/stanc3/blob/master/Jenkinsfile#L675 We're making sure that concurrent builds do not overlap by using a specific dir for each parallel step dir("${env.WORKSPACE}/linux-mips64el") This works fine on jenkins but for some reason it fails on jenkins2, I'm not entirely sure why, my guess was related to permissions for the user/group but if those are the same, hmmh. See dind call https://github.com/stan-dev/stanc3/blob/master/scripts/build_multiarch_stanc3.sh#L28 I'll try to debug it during the weekend, see if I can find any differences.

serban-nicusor-toptal commented 11 months ago

I ran a few builds and everything seems to be working as expected, I suspect that this build failed because of leftovers before we fixed concurrent builds, with different ownership of the files. I've cleaned up that directory, now in theory master builds should work fine as new files will be written with correct permissions. Since no change is needed from Jenkinsfile, we can close this PR. Please let me know your thoughts on this.