Open glin opened 1 year ago
@jspiewak pointed out that we explicitly set HOME
to .
in the Jenkins job: https://github.com/rstudio/r-builds/blob/c97f0cc4dd4d275121796c9e4271ffd9bfe7372d/deploy.Jenkinsfile#L11
That explains the setting then, and how it was able to work before. The HOME dir is used for the cache, which exists on the host because it's the workspace dir that gets mounted in. Changing it to env.WORKSPACE
as @jspiewak suggested gets the original job working again without all these other changes.
I'll still propose running the whole deployment in a single container though, to keep things simpler and easier to debug locally.
@jspiewak Good idea, let's go with that first to get the deploy unblocked asap. I'll follow up with a new PR soon and add some docs. The main problem was just how hard the deploy job was to understand and debug, and I would be fine with the original docker-inside-docker approach as long as it was documented well enough. I'd still prefer reducing it to just one container, but am curious what @stevenolen and @jforest think.
I updated the docs and tried to simplify the job a little more:
HOME
config - restored the jenkins user so HOME
wouldn't be /
and break npmdockerizePip: true
would need to be removed in serverless-custom.ymlhttps://github.com/rstudio/r-builds/pull/175 seems to be working fine for now though, but I'll still leave this up for a while.
Fixes #165. Tested this in staging at https://build.posit.it/blue/organizations/jenkins/r-builds%2Fdeploy-r-builds/detail/deploy-r-builds/195/pipeline, and then confirmed that a staging rebuild works. The CI failure is for an unrelated reason.
If this works well, someone will need to update serverless-custom.yml since I don't think I have permissions for that.
We were using the serverless-python-requirements plugin with
dockerizePip: true
, which installs the Python requirements in a separate docker container. The plugin has to mount some directories (requirements.txt, cache files) into the separate container, and this failed in two ways.The first issue is that the mounted directory was ending up as a relative path starting with a period, which docker doesn't support. With
serverless --verbose
, you get the actual error message:This cache dir comes from the plugin's use of the
appdirectory
npm package, which generates the cache paths based off$HOME
: https://github.com/MrJohz/appdirectory/blob/27f19a6eceb46110cd5d6882a18cae3a4da98331/lib/appdirectory.js#L91-L94When running the plugin locally in Docker, it works fine because
$HOME
resolves to/root
or/home/<user>
. However, in Jenkins, the docker image is run withHOME
set to.
:This may have been a Jenkins behavior change in some upgrade, which could explain why the job suddenly started failing.We were setttingHOME=.
, see https://github.com/rstudio/r-builds/pull/171#issuecomment-1622645902The second issue is that we were mounting the docker socket in the agent container. When you do this, any
docker run -v
mounts from within the container will use paths from the host rather than the container. So even with the cache directory fixed, the plugin was trying to mount directories in the python container that didn't exist, causing a new issue. I'm not sure how the Jenkins job was working before.To fix both of these, I temporarily patched serverless-custom.yml to not dockerize pip
and disable caching:Switched the deploy image to a base OS image rather than a python/node or lambda image, because the language specific images felt hard to maintain. The nodesource install script didn't work on the python/node images' Debian 11, and the lambda images use AL2 which is super old and no longer supports newer versions of Node.