timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-17726] with upper ASCII characters, jenkins silently fails to wipe workspace, rendering the workspace unusable #10177

Open timja opened 11 years ago

timja commented 11 years ago

This may be related to https://issues.jenkins-ci.org/browse/JENKINS-12610, but the last comment there is a request for replication instructions.

Sometimes, (I think when filenames with upper ASCII characters exist in the workspace), Jenkins fails to wipe the workspace, instead leaving some directories without the execute permission bit set, but reports that the wipe was successful. Then because the workspace directory can't be accessed, the code checkout (using the perforce plugin) fails, giving a misleading error that the problem has to do with communications with the source control server.

When I ssh into the server to cleanup, I see that the 'workspace' (or 'workspace@2' or 'workspace@3' or ...) directory has lost its executable bit. I 'chmod +x workspace', 'cd workspace', and see that the top-level checkout directory has also lost its executable bit. I restore that bit, change into that directory, and see that 'etc' (but no other directories) has lost its executable bit. I follow the trail of non-executable directories, and come to a directory with just a few files or symlinks in it, all with upper ASCII characters (other files in the directory have been deleted). By resetting permissions along the way and deleting those files, the next build to use that workspace has no problems. The files that have to be manually removed are:

lrwxrwxrwx 1 jenkins jenkins 65 Oct 11 2012 AC_Raíz_Certicámara_S.A..pem -> /usr/share/ca-certificates/mozilla/AC_Raíz_Certicámara_S.A..crt
lrwxrwxrwx 1 jenkins jenkins 68 Oct 11 2012 Certinomis_Autorité_Racine.pem -> /usr/share/ca-certificates/mozilla/Certinomis_Autorité_Racine.crt
lrwxrwxrwx 1 jenkins jenkins 86 Oct 11 2012 EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.pem -> /usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt
lrwxrwxrwx 1 jenkins jenkins 83 Oct 11 2012 NetLock_Arany_=Class_Gold=Főtanúsítvány.pem -> /usr/share/ca-certificates/mozilla/NetLock_Arany=Class_Gold=_Főtanúsítvány.crt
lrwxrwxrwx 1 jenkins jenkins 104 Oct 11 2012 TÜBİTAK_UEKAE_Kök_Sertifika_Hizmet_Sağlayıcısı_Sürüm_3.pem -> /usr/share/ca-certificates/mozilla/TÜBİTAK_UEKAE_Kök_Sertifika_Hizmet_Sağlayıcısı_Sürüm_3.crt

I am using Jenkins installed from the repositories on Ubuntu 12.04 on a 64-bit VM, then I've overwritten the jenkins.war with the current LTS (1.480.3). Our build process sets up a chroot environment, and the files in question are coming from /etc/ssl/certs/. To replicate, you should be able to use a script like this (takes a few minutes to copy everything over) inside a workspace:

#!/bin/bash
dirlist="bin etc lib sbin"
for checkdir in lib32 lib64
do
if [ -d "/$checkdir" ]
then
dirlist="$dirlist $checkdir"
fi
done
for getdir in $dirlist
do
tar -cf - /$getdir/* 2>/dev/null | tar -xf -
done

  1. I am pretty sure the exclude=ca-certificates was an older attempt
  2. to avoid this issue, which I started running into again after
  3. copying all of /etc instead of using --no-recursion to only
  4. copy part of it
    tar -exclude-backups --exclude=backgrounds --exclude=ca-certificates --exclude=vmware-tools --exclude='linux-headers*' --exclude=/usr/share/doc -cf - /usr 2>/dev/null | tar -xf -
    mkdir tmp proc dev run var
    ln -s ../run var/run
    sudo mount -t devtmpfs none dev
    sudo mount --bind /proc proc
    sudo mount --bind /run run

our script then does a build inside the new chroot, like:
sudo chroot $workspace_directory /bin/bash -c "cd Tools && ./build.sh"

and after finishing, cleans up its mounts and root-owned files with:
sudo umount run
sudo umount proc
sudo umount dev
builduser=$(whoami)
sudo chroot $workspace_directory /bin/bash -c "chown -R ${builduser}: /"

Then the next build to use that workspace fails to clean up properly, and no builds can use that workspace until it is manually restored. The console log for a failing build looks like (note especially the misleading "Clean complete" text):

Started by user Bennett, Drew
Building in workspace /var/lib/jenkins/jobs/try/workspace@2
Using master perforce client: owi_unittest_try_2
[workspace@2] $ /opt/p4/bin/p4 workspace -o owi_unittest_try_2
Note: .repository directory in workspace (if exists) is skipped during clean.
Wiping workspace...
Wiped workspace.
Clean complete, took 10062 ms
Last build changeset: 546966
[workspace@2] $ /opt/p4/bin/p4 changes -s submitted -m 1 //owi_unittest_try_2/...
[workspace@2] $ /opt/p4/bin/p4 counter change
Caught exception communicating with perforce. Could not run perforce command.com.tek42.perforce.PerforceException: Could not run perforce command.
at hudson.plugins.perforce.HudsonP4DefaultExecutor.exec(HudsonP4DefaultExecutor.java:88)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:321)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:292)
at com.tek42.perforce.parse.Counters.getCounter(Counters.java:60)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:903)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1256)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:590)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:495)
at hudson.model.Run.execute(Run.java:1502)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: java.io.IOException: Cannot run program "/opt/p4/bin/p4" (in directory "/var/lib/jenkins/jobs/try/workspace@2"): java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at hudson.Proc$LocalProc.(Proc.java:244)
at hudson.Proc$LocalProc.(Proc.java:216)
at hudson.Launcher$LocalLauncher.launch(Launcher.java:709)
at hudson.Launcher$ProcStarter.start(Launcher.java:338)
at hudson.plugins.perforce.HudsonP4DefaultExecutor.exec(HudsonP4DefaultExecutor.java:79)
... 12 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:164)
at java.lang.ProcessImpl.start(ProcessImpl.java:81)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
... 17 more
ERROR: Unable to communicate with perforce. Could not run perforce command.
Archiving artifacts
ERROR: Publisher hudson.plugins.emailext.ExtendedEmailPublisher aborted due to exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at hudson.model.Run$RunExecution$CheckpointSet.waitForCheckPoint(Run.java:1363)
at hudson.model.Run.waitForCheckpoint(Run.java:1321)
at hudson.model.CheckPoint.block(CheckPoint.java:144)
at hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:35)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:718)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:693)
at hudson.model.Build$BuildExecution.cleanUp(Build.java:192)
at hudson.model.Run.execute(Run.java:1546)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Finished: FAILURE


Originally reported by dbennett, imported from: with upper ASCII characters, jenkins silently fails to wipe workspace, rendering the workspace unusable
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 10 years ago

danielbeck:

Are you sure this isn't related to charset issues given the umlauts, Turkish uppercase i, accents etc. in the file names that are left? Do successfully deleted files also have these chars?

timja commented 2 years ago

[Originally related to: JENKINS-12610]