openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Jenkins doesn't have label XY #4020

Closed ljelinkova closed 6 years ago

ljelinkova commented 6 years ago

The E2E tests encounter this issue from time to time - what does it mean? Is there something we could do about it?

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-beta/332/05-02-jenkins-log-failed.png

ppitonak commented 6 years ago

related to #3797

This message means that Jenkins is waiting for slave to be provisioned. In this case though, it stays in this state for long time and Jenkins job times out, therefore increasing priority.

ppitonak commented 6 years ago

This issue is still appearing regularly.

ppitonak commented 6 years ago

It happened now with accout ppitonak on starter-us-east-2

Part of Jenkins log:

INFO: Waiting for Pod to be scheduled (60/100): jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:52 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Container is waiting jenkins-slave-9lls3-5zv0r [jnlp]: ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={})
Aug 21, 2018 10:36:52 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Container is waiting jenkins-slave-9lls3-5zv0r [maven]: ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={})
Aug 21, 2018 10:36:52 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (61/100): jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:56 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:57 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:36:57 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminated Kubernetes instance for agent ppitonak-jenkins/jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:57 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Disconnected computer jenkins-slave-9lls3-5zv0r
Terminated Kubernetes instance for agent ppitonak-jenkins/jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:58 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
WARNING: Error in provisioning; agent=KubernetesSlave name: jenkins-slave-9lls3-5zv0r, template=PodTemplate{inheritFrom='base', name='jenkins-slave-9lls3', namespace='ppitonak-jenkins', label='mavene2e_0821_1227_7655_1', serviceAccount='jenkins', volumes=[org.csanchez.jenkins.plugins.kubernetes.volumes.SecretVolume@11d0334, org.csanchez.jenkins.plugins.kubernetes.volumes.SecretVolume@10cc66c, org.csanchez.jenkins.plugins.kubernetes.volumes.SecretVolume@292ea9, org.csanchez.jenkins.plugins.kubernetes.volumes.SecretVolume@1dd2c32, org.csanchez.jenkins.plugins.kubernetes.volumes.SecretVolume@c28dde], containers=[ContainerTemplate{name='jnlp', image='fabric8/jenkins-slave-base-centos7:vb0268ae', workingDir='/home/jenkins/', args='${computer.jnlpmac} ${computer.name}', resourceLimitMemory='256Mi'}, ContainerTemplate{name='maven', image='fabric8/maven-builder:v9ff62e0', workingDir='/home/jenkins/', command='/bin/sh -c', args='cat', ttyEnabled=true, resourceLimitMemory='640Mi', envVars=[KeyValueEnvVar [getValue()=-Duser.home=/root/ -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xms10m -Xmx192m, getKey()=_JAVA_OPTIONS], KeyValueEnvVar [getValue()=-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn, getKey()=MAVEN_OPTS]]}]}
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/ppitonak-jenkins/pods/jenkins-slave-9lls3-5zv0r/log?pretty=false&container=jnlp&tailLines=30. Message: container "jnlp" in pod "jenkins-slave-9lls3-5zv0r" is terminated. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=container "jnlp" in pod "jenkins-slave-9lls3-5zv0r" is terminated, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=BadRequest, status=Failure, additionalProperties={}).
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:409)
        at io.fabric8.kubernetes.client.dsl.internal.PodOperationsImpl.getLog(PodOperationsImpl.java:141)
        at io.fabric8.kubernetes.client.dsl.internal.PodOperationsImpl.getLog(PodOperationsImpl.java:58)
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.logLastLines(KubernetesLauncher.java:225)
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:155)
        at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

FATAL: Computer for agent is null: jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:58 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:36:58 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
SEVERE: Computer for agent is null: jenkins-slave-9lls3-5zv0r
Aug 21, 2018 10:37:05 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Excess workload after pending Kubernetes agents: 1
Aug 21, 2018 10:37:05 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template: Kubernetes Pod Template
Aug 21, 2018 10:37:05 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:37:05 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from openshift with 1 executors. Remaining excess workload: 0
Aug 21, 2018 10:37:15 AM hudson.slaves.NodeProvisioner$2 run
INFO: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
Aug 21, 2018 10:37:15 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:37:15 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Created Pod: jenkins-slave-9lls3-6gqvl in namespace ppitonak-jenkins
Aug 21, 2018 10:37:15 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (0/100): jenkins-slave-9lls3-6gqvl
Aug 21, 2018 10:37:21 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Container is waiting jenkins-slave-9lls3-6gqvl [jnlp]: ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={})
Aug 21, 2018 10:37:21 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Container is waiting jenkins-slave-9lls3-6gqvl [maven]: ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={})
Aug 21, 2018 10:37:21 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Waiting for Pod to be scheduled (1/100): jenkins-slave-9lls3-6gqvl

a minute or two later

INFO: Waiting for agent to connect (39/100): jenkins-slave-9lls3-6gqvl
Aug 21, 2018 10:40:59 AM io.jenkins.blueocean.autofavorite.FavoritingScmListener onCheckout
SEVERE: Unexpected error when retrieving changeset
hudson.plugins.git.GitException: Error: git whatchanged --no-abbrev -M "--format=commit %H%ntree %T%nparent %P%nauthor %aN <%aE> %ai%ncommitter %cN <%cE> %ci%n%n%w(76,4,4)%s%n%n%b" -n 1 92fc97ecc4ff0f4ffc0783eeb789bf5d0f1f726a in /home/jenkins/workspace/k_e2e-0821-1227-7655_master-33MUYYINNVJY3TXEGAEPZUAIQOGKYUEO3DV3OU736IKUT4RSASCA
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$6.execute(CliGitAPIImpl.java:971)
        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
        at hudson.remoting.UserRequest.perform(UserRequest.java:207)
        at hudson.remoting.UserRequest.perform(UserRequest.java:53)
        at hudson.remoting.Request$2.run(Request.java:358)
        at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at hudson.remoting.Engine$1$1.run(Engine.java:98)
        at java.lang.Thread.run(Thread.java:748)
        Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from ip-10-129-96-1.us-east-2.compute.internal/10.129.96.1:48548
                at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
                at hudson.remoting.UserResponse.retrieve(UserRequest.java:310)
                at hudson.remoting.Channel.call(Channel.java:908)
                at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:498)
                at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
                at com.sun.proxy.$Proxy137.execute(Unknown Source)
                at io.jenkins.blueocean.autofavorite.FavoritingScmListener.getChangeSet(FavoritingScmListener.java:155)
                at io.jenkins.blueocean.autofavorite.FavoritingScmListener.onCheckout(FavoritingScmListener.java:80)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:127)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
                at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
                at hudson.security.ACL.impersonate(ACL.java:260)
                at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                ... 1 more

Aug 21, 2018 10:40:59 AM com.squareup.okhttp.internal.Platform$JdkWithJettyBootPlatform getSelectedProtocol
INFO: ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:41:01 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:41:01 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 21, 2018 10:41:04 AM org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1 doLaunch
INFO: Created process inside pod: [jenkins-slave-9lls3-6gqvl], container: [maven] with pid:[-1]
Aug 21, 2018 10:41:05 AM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
ppitonak commented 6 years ago

FYI @sthaha @kishansagathiya @hrishin

kishansagathiya commented 6 years ago

@ppitonak Going through the logs, It seems that this is the infamous openshift networking issue. connection between slave and master is broken.

@chmouel can you confirm?

chmouel commented 6 years ago

@kishansagathiya the first one is just some timeout but it goes after to a git checkout, the network issue we have seen was only between services inside openshift, this goes outside to git and can't retrive things there. I would check if there is any issues with github at that time and if there is not an issue with the git repo and if you can manually checkin the 92fc97ecc4ff0f4ffc0783eeb789bf5d0f1f726a change

ljelinkova commented 6 years ago

This is the latest log from failed E2E test job.

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-beta/591/oc-logs-output.txt

There are multiple errors like this

WARNING: Unexpected exception encountered while provisioning agent Kubernetes Pod Template
java.io.IOException: Could not mkdirs /var/lib/jenkins/nodes
    at jenkins.model.Nodes.getNodesDir(Nodes.java:348)
    at jenkins.model.Nodes.save(Nodes.java:266)
    at hudson.util.PersistedList.onModified(PersistedList.java:173)
    at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
    at hudson.model.Slave.<init>(Slave.java:198)
    at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.<init>(KubernetesSlave.java:98)
    at org.csanchez.jenkins.plugins.kubernetes.ProvisioningCallback.call(ProvisioningCallback.java:68)
    at org.csanchez.jenkins.plugins.kubernetes.ProvisioningCallback.call(ProvisioningCallback.java:42)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ppitonak commented 6 years ago

I saw the same today http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-beta/596/oc-logs-output.txt http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2-beta/595/oc-logs-output.txt http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-beta/597/oc-logs-output.txt http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2a-released/598/oc-logs-output.txt

ljelinkova commented 6 years ago

Is there any update on this issue?

kishansagathiya commented 6 years ago

@ppitonak @ljelinkova Are we still seeing the behaviour mentioned in https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-414631837 and https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-417251004

If so, how frequent is it?

kishansagathiya commented 6 years ago

In the first log in the comment https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-414631837, for some reason, slave pod failed to come up, which is why GET request is failing. From the logs, it is not possible to find why that pod failed(A lot of log links here are dead).

ppitonak commented 6 years ago

I haven't seen this issue for couple of weeks but it might be caused by the fact that due to https://github.com/openshiftio/openshift.io/issues/4215 Jenkins won't get to this stage. We need to fix #4215 first and then we can judge.

kishansagathiya commented 6 years ago

Can't make out the reason for https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-417251004 @hrishin do you have any ideas on this?

chmouel commented 6 years ago

Transient error, gluster not working properly ? can you run a kibana search to see how many time we see this ?

kishansagathiya commented 6 years ago

The main issue title is very different compared to some of the comments in here. I feel like some of the comments in here could be separate issues.

@ppitonak Are all these logs seen together?

Otherwise, I feel like I should create a separate issue for https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-417251004 and close the current issue because https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-415381653 and https://github.com/openshiftio/openshift.io/issues/4020#issuecomment-428487659

cc @sthaha @hrishin

hrishin commented 6 years ago

Yes, I agree @chmouel comment. Appears more of underlying OpenShift issue rather Jenkins.

@kishansagathiya We are supposed to discuss this issue in IC, right?

kishansagathiya commented 6 years ago

Transient error, gluster not working properly ?

@hrishin I remember you saying that this is an infra issue. You or @ppitonak or @chmouel, if any of you are aware of that you can link it here.

ljelinkova commented 6 years ago

The issue still occurs in E2E tests, the last time yesterday. This is the output of script that retrieves Jenkins log:

http://artifacts.ci.centos.org/devtools/e2e/devtools-test-e2e-openshift.io-smoketest-us-east-2-released/833/oc-logs-output.txt

rg.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Excess workload after pending Kubernetes agents: 1
Oct 09, 2018 2:12:08 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template: Kubernetes Pod Template
Oct 09, 2018 2:12:08 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Oct 09, 2018 2:12:08 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from openshift with 1 executors. Remaining excess workload: 0
Oct 09, 2018 2:12:18 PM hudson.slaves.NodeProvisioner$2 run
WARNING: Unexpected exception encountered while provisioning agent Kubernetes Pod Template
java.io.IOException: Could not mkdirs /var/lib/jenkins/nodes
    at jenkins.model.Nodes.getNodesDir(Nodes.java:348)
    at jenkins.model.Nodes.save(Nodes.java:266)
    at hudson.util.PersistedList.onModified(PersistedList.java:173)
    at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
    at hudson.model.Slave.<init>(Slave.java:198)
    at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
    at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.<init>(KubernetesSlave.java:98)
    at org.csanchez.jenkins.plugins.kubernetes.ProvisioningCallback.call(ProvisioningCallback.java:68)
    at org.csanchez.jenkins.plugins.kubernetes.ProvisioningCallback.call(ProvisioningCallback.java:42)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

As for me, you can create separate issue for that if you find title misleading.

ppitonak commented 6 years ago

@kishansagathiya no, those two don't occur at the same time.

Jenkins doesn't have label xxx and Could not mkdirs /var/lib/jenkins/nodes occur always together. Feel free to update this issue as you want and create new one for the Unexpected error when retrieving changeset

ppitonak commented 6 years ago

happened 20 min ago http://artifacts.ci.centos.org/devtools/e2e/devtools-saas-openshiftio-e2e-smoketest-released/236/05-03-jenkins-log-failed.png

sthaha commented 6 years ago

@ppitonak how long did the build take before it aborted?

ljelinkova commented 6 years ago

@sthaha At the moment, the e2e tests are aborted after 10 minutes. Previously it was 30 minutes. In either case, Jenkins log is stuck on _Jenkins doesn’t have label mavene2e_1010_1510_79941 .

kishansagathiya commented 6 years ago
java.io.IOException: Could not mkdirs /var/lib/jenkins/nodes

Going through the code, it seems, that this could be the only place that could lead us to such an error https://github.com/jdkmirrors/openjdk8/blob/656225c5ace82fa23d2e3bf382e89ace5866a1eb/jdk/src/share/classes/java/io/File.java#L1354

parent != nil to be specific

Initially I was thinking that this could be because of lack of permissions. But then earlier logs show that we are able to read and write into /var/lib/jenkins, and even if this had any thing to do with permissions, exceptions should have started from File.java.

which mean in the code that is failing https://github.com/jenkinsci/jenkins/blob/408e135aedfb37fc234b264e3714f384d816a01d/core/src/main/java/jenkins/model/Nodes.java#L362-L363 nodesDir.mkdirs() returns false without exception

I see a bunch of doubtful log lines. One being

WARNING: Some health checks are reporting as unhealthy: [disk-space : Only 0.530 Gb free on (master)]

But I don't this could stop from creating a directory

kishansagathiya commented 6 years ago

Some other potentially problematic logs (probably not fully relevant to this issue)

SEVERE: Skip registration of GHHook (The Jenkins URL is empty. Explicitly set the Jenkins URL in the global configuration or in the GitHub plugin configuration to manage webhooks.)
WARNING: This Jenkins instance uses deprecated Remoting protocols: CLI2-connect,JNLP2-connect,JNLP3-connectIt may impact stability of the instance. If newer protocol versions are supported by all system components (agents, CLI and other clients), it is highly recommended to disable the deprecated protocols.
WARNING: JENKINS-34564 path sanitization ineffective when using legacy Workspace Root Directory ‘${ITEM_ROOTDIR}/workspace’; switch to ${JENKINS_HOME}/workspace/${ITEM_FULLNAME} as in JENKINS-8446 / JENKINS-21942
kishansagathiya commented 6 years ago

I am going to try this and see if it helps Create that /var/lib/jenkins/nodes using pipeline library. If it can't bail out saying that there is an infrastructure issue.

kishansagathiya commented 6 years ago

Create that /var/lib/jenkins/nodes using pipeline library. If it can't bail out saying that there is an infrastructure issue.

This won't work. Can't create directories through pipeline library. It doesn't have permissions to do so

Jenkins Logs (Using java.io.File mkdir)
OpenShift Build ksagathi/app-test-1-7 from https://github.com/kishansagathiya/app-test-1.git Connecting to https://api.github.com using kishansagathiya/****** (cd-github) Obtained Jenkinsfile from 1786347a396e1cf250d33c87c77964724367c7b7 Running in Durability level: MAX_SURVIVABILITY Loading library github.com/kishansagathiya/fabric8-pipeline-library@issue_4020 Attempting to resolve issue_4020 from remote references... > git --version # timeout=10 > git ls-remote -h -t https://github.com/kishansagathiya/fabric8-pipeline-library.git # timeout=10 Found match: refs/heads/issue_4020 revision b658ccda1bb864b7f282f9b5bae2ec2248412940 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/kishansagathiya/fabric8-pipeline-library.git # timeout=10 Fetching without tags Fetching upstream changes from https://github.com/kishansagathiya/fabric8-pipeline-library.git > git --version # timeout=10 > git fetch --no-tags --progress https://github.com/kishansagathiya/fabric8-pipeline-library.git +refs/heads/*:refs/remotes/origin/* Checking out Revision b658ccda1bb864b7f282f9b5bae2ec2248412940 (issue_4020) > git config core.sparsecheckout # timeout=10 > git checkout -f b658ccda1bb864b7f282f9b5bae2ec2248412940 Commit message: "testing" > git rev-list --no-walk 5547a2bd86061a15af64f1289a3683674e50748b # timeout=10 [Pipeline] echo Loaded PipelineConfiguration PipelineConfiguration{jobNameToKindMap={}, ciBranchPatterns=[PR-.*], cdBranchPatterns=[master], cdGitHostAndOrganisationToBranchPatterns={}, disableITestsCD=false, disableITestsCI=false, useDockerSocketFlag=false} [Pipeline] echo Loaded the useDockerSocket flag false [Pipeline] podTemplate [Pipeline] { [Pipeline] node Still waiting to schedule task Jenkins doesn’t have label mavenapp_test_1_7 Running on jenkins-slave-dc4l8-6j5q6 in /home/jenkins/workspace/sagathiya_app-test-1_master-MAMXEQ2EDWHSA5IYF3DQKOVRD7TONLZQPLPEU3SMSHQXD47XUXYQ [Pipeline] { [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline GitHub has been notified of this commit’s build result org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use method java.io.File mkdir at org.jenkinsci.plugins.scriptsecurity.sandbox.whitelists.StaticWhitelist.rejectMethod(StaticWhitelist.java:175) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:137) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:159) at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) at mavenNode.call(/var/lib/jenkins/jobs/kishansagathiya/jobs/app-test-1/branches/master/builds/7/libs/github.com/kishansagathiya/fabric8-pipeline-library/vars/mavenNode.groovy:14) at ___cps.transform___(Native Method) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:77) at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:331) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:82) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:243) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:231) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: FAILURE
Jenkins Logs (Using sh mkdir)
OpenShift Build ksagathi/app-test-1-6 from https://github.com/kishansagathiya/app-test-1.git Connecting to https://api.github.com using kishansagathiya/****** (cd-github) Obtained Jenkinsfile from 17ca89072b1f211e150f8402d60287e35d09d0d6 Running in Durability level: MAX_SURVIVABILITY Loading library github.com/kishansagathiya/fabric8-pipeline-library@issue_4020 Attempting to resolve issue_4020 from remote references... > git --version # timeout=10 > git ls-remote -h -t https://github.com/kishansagathiya/fabric8-pipeline-library.git # timeout=10 Found match: refs/heads/issue_4020 revision 5547a2bd86061a15af64f1289a3683674e50748b > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/kishansagathiya/fabric8-pipeline-library.git # timeout=10 Fetching without tags Fetching upstream changes from https://github.com/kishansagathiya/fabric8-pipeline-library.git > git --version # timeout=10 > git fetch --no-tags --progress https://github.com/kishansagathiya/fabric8-pipeline-library.git +refs/heads/*:refs/remotes/origin/* Checking out Revision 5547a2bd86061a15af64f1289a3683674e50748b (issue_4020) > git config core.sparsecheckout # timeout=10 > git checkout -f 5547a2bd86061a15af64f1289a3683674e50748b Commit message: "testign" > git rev-list --no-walk b31ac591a9a17cf4ea89bc04e6f642f7746dabe6 # timeout=10 [Pipeline] echo Loaded PipelineConfiguration PipelineConfiguration{jobNameToKindMap={}, ciBranchPatterns=[PR-.*], cdBranchPatterns=[master], cdGitHostAndOrganisationToBranchPatterns={}, disableITestsCD=false, disableITestsCI=false, useDockerSocketFlag=false} [Pipeline] echo Loaded the useDockerSocket flag false [Pipeline] podTemplate [Pipeline] { [Pipeline] node Still waiting to schedule task Jenkins doesn’t have label mavenapp_test_1_6 Running on jenkins-slave-wmr2b-xz4zj in /home/jenkins/workspace/sagathiya_app-test-1_master-MAMXEQ2EDWHSA5IYF3DQKOVRD7TONLZQPLPEU3SMSHQXD47XUXYQ [Pipeline] { [Pipeline] sh [sagathiya_app-test-1_master-MAMXEQ2EDWHSA5IYF3DQKOVRD7TONLZQPLPEU3SMSHQXD47XUXYQ] Running shell script + mkdir -p /var/lib/jenkins/nodes mkdir: cannot create directory '/var/lib/jenkins': Permission denied [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline GitHub has been notified of this commit’s build result ERROR: script returned exit code 1 Finished: FAILURE

I am going to explore more see what can change here or some other way

kishansagathiya commented 6 years ago

I think this is why k8-plugin is able to mkdir https://github.com/jenkinsci/kubernetes-plugin/blob/2e10cd29fa7bd1bca86e3716f04af61af818639e/src/main/java/org/csanchez/jenkins/plugins/kubernetes/ProvisioningCallback.java#L35-L40

kishansagathiya commented 6 years ago

Some fun stuff

[Pipeline] echo
file or directory /var/lib/jenkins/nodes exist
[Pipeline] End of Pipeline

GitHub has been notified of this commit’s build result

org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use method java.io.File isDirectory
    at org.jenkinsci.plugins.scriptsecurity.sandbox.whitelists.StaticWhitelist.rejectMethod(StaticWhitelist.java:175)
    at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:137)
    at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
    at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:159)
    at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
    at mavenNode.call(/var/lib/jenkins/jobs/kishansagathiya/jobs/app-test-

it will allow to check if a file exists or not, but if I try to check if it is a directory, no permissions. I think it is because of https://wiki.jenkins.io/display/JENKINS/Script+Security+Plugin

mvulavak commented 6 years ago

@kishansagathiya Is this still an issue or is it fixed?

ppitonak commented 6 years ago

@mvulavak this issue could not surface because of other issues in idler/Jenkins. We need more time to decide if it is still happening.

kishansagathiya commented 6 years ago

This issue is waiting for a tenant update Once https://github.com/fabric8-services/fabric8-tenant/pull/665 goes in prod we can merge pipeline library changes

ppitonak commented 6 years ago

@kishansagathiya when do you plan to deploy it to prod?

kishansagathiya commented 6 years ago

@ppitonak today @piyush-garg or @hrishin would be doing those tenant updates

piyush-garg commented 6 years ago

Tenant has been updated in prod using https://github.com/openshiftio/saas-openshiftio/pull/1122 Cluster wide update is still pending

ljelinkova commented 6 years ago

I've just seen this in the logs:

Still waiting to schedule task
‘Jenkins’ doesn’t have label ‘mavene2e_1109_0720_4737_1’
Agent jenkins-slave-c720n-tb5tn is provisioned from template Kubernetes Pod Template
Agent specification [Kubernetes Pod Template] (mavene2e_1109_0720_4737_1): 
* [jnlp] openshift/jenkins-slave-base-centos7:v4.0(resourceLimitMemory: 256Mi)
* [maven] openshift/jenkins-slave-maven-centos7:v4.0(resourceLimitMemory: 640Mi)

Running on jenkins-slave-c720n-tb5tn in /home/jenkins/workspace/g_e2e-1109-0720-4737_master-ACRUOQ4JVWIYS5SYYL724JZCT7PMCSZZPAD7M3TJRHNPS443AMKA
[Pipeline] {
[Pipeline] checkout
Cloning the remote Git repository
Cloning with configured refspecs honoured and without tags
Cloning repository https://github.com/ljelinkova-testing/e2e-1109-0720-4737.git
 > git init /home/jenkins/workspace/g_e2e-1109-0720-4737_master-

At first, there was ‘Jenkins’ doesn’t have label ‘mavene2e_1109_0720_4737_1’ message but after a while the build continued. Does it mean that this issue is fixed in production now?

kishansagathiya commented 6 years ago

With https://github.com/fabric8io/fabric8-pipeline-library/pull/438#event-1962846087 merged this should have been fixed now. I would wait for few days for it, if it shows up in e2e tests again, after that we can close this.