Open ddsat opened 3 years ago
This looks like it might be related to userids and permissions; it's entirely possible that OCP 4.6 is randomising the userid (or group memberships) in some way that breaks the Maven script.
To confirm this, it's probably worth running the "id" and "whoami" commands from the script that runs Maven, as I suspect it may not be a member of the mqbrkrs group. As for fixing the problem, I think the easiest way to fix it would be to run "chmod -R 777 /var/mqsi" as root in the container Dockerfile, as that would certainly eliminate the permissions issues.
It's also possible to set the MQSI_REGISTRY env var to somewhere else that's just been created with mqsicreateworkdir; I had to do this when running buildah with RedHat's s2i, and the script is here: https://github.com/tdolby-at-uk-ibm-com/ace-s2i-demo/blob/main/ace-build.sh
thank you @tdolby-at-uk-ibm-com. I tried to set serviceaccount that executes pipelinerun as pipeline and added it to group that can RunAsAny. Yet still the same.
For chmod you meant rebuilding tdolby/experimental:pipeline-travis-build-maven?
It's now issue in the below step and it fails when excecute mqsicreateworkdir in maven script under demo-infrastructure. Can you suggest a simple change in script for it to work (according to current script in this project) name: maven-build image: tdolby/experimental:pipeline-travis-build-maven script: |
export LICENSE=accept
. /opt/ibm/ace-11/server/bin/mqsiprofile
mkdir /work/maven-output
cd /work/ace-demo-pipeline
mvn -Dinstall.work.directory=/work/maven-output/ace-server install
volumeMounts:
- mountPath: /work
name: work
Yes, you'd have to build another container on top of the original one, with a chmod in the Dockerfile.
However, it seems possible that if you add
export MQSI_REGISTRY=/tmp/perms-work-dir/config
mqsicreateworkdir /tmp/perms-work-dir
export MQSI_WORKPATH=$MQSI_REGISTRY
just before the mkdir /work/maven-output
line, then it should work. This is what I had to do with buildah in the s2i repo, and so it seemed like a good bet.
Even though I can't recreate your setup, I've tested this solution locally by deliberately making /var/mqsi unreadable and running Maven; originally, it showed the same error as you see, and after those three lines it started working.
Thanks for your good bet. It works. Some other issues with TeaTests job but I can skip that part.
Got another issue with permission at the next step. Could rectify it please? STEP-DOCKER-BUILD-AND-PUSH Error: error resolving dockerfile path: copying dockerfile: open /kaniko/Dockerfile: permission denied
One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?
Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).
From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to https://github.com/GoogleContainerTools/kaniko/issues/681 where the issue has been closed after some investigation.
They suggest using securityContext: runAsUser: 0
(as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.
While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.
One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?
In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.
However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.
I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.
One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?
In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.
However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.
I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.
Really appreciate your help.
in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod
Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?
With my environment context, do you think I still can go down this path?
Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).
From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to GoogleContainerTools/kaniko#681 where the issue has been closed after some investigation.
They suggest using
securityContext: runAsUser: 0
(as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.
Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?
in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod
Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?
With my environment context, do you think I still can go down this path?
Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.
You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.
Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?
I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 0
as the solution.
Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?
I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting
apiVersion: v1 kind: Pod metadata: name: security-context-demo spec: securityContext: runAsUser: 0
as the solution.
the running pod is gcr.io/kaniko-project/executor:v0.16.0 Can you suggest the change in script. I'll try to run it
in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard? With my environment context, do you think I still can go down this path?
Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.
You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.
I'll give it a try after I can manage to execute both tasks in this pipeline successful.
After a bit of effort, I managed to break my local OpenShift cluster the same way yours broke, and I believe what's needed is a different pipelinerun. Instead of the ace-pipeline-run-crc.yaml as-is, I think the following should work:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: ace-pipeline-run-1
spec:
serviceAccountName: ace-tekton-service-account
pipelineRef:
name: ace-pipeline
podTemplate:
securityContext:
runAsNonRoot: false
runAsUser: 0
params:
- name: dockerRegistry
value: "image-registry.openshift-image-registry.svc:5000/default"
which should set the pod to running as root, which should in turn allow kaniko to operate successfully. The YAML changes came from https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#specifying-a-pod-template and appear to be completely standard.
This seemed to work for me (after I made sure the docker credentials in "regcred" were present and correct) and hopefully will work for you and not fall foul of any restrictions on running pods as root . . .
Thanks @tdolby-at-uk-ibm-com for your detailed suggestions. I'll try it today.
I've updated securityContext and secret for my image registry in the serviceaccount pipeline that i'm using (highlighted in bold). It seems you've updated ace minimal version to ace-minimal-build-12.0.1.0-alpine too. Both tasks of ace-demo-pipeline works successfully thanks to your suggestion.
However deployment yaml file can't successfully create the pod. I'm not sure if it's because of my environment with certified image Platform Navigator instance and certified ACE Dashboard image (11.0.0.11-r2) or not. Will try to look more into this.
$ oc get serviceaccount pipeline NAME SECRETS AGE pipeline 2 12d $ oc get serviceaccount pipeline -o yaml apiVersion: v1 imagePullSecrets:
Hi, I'm trying your approach but facing below issue at step maven-build of task maven-ace-build (tekton pipeline, ocp 4.6). Any idea what may be missing from your script and how to resolve it please. Thanks
Compiling log [[1;34mINFO[m] [1m---------------< [0;36mace-demo-pipeline:demo-infrastructure[0;1m >----------------[m [[1;34mINFO[m] [1mBuilding demo-infrastructure 0.0.1 [6/7][m [[1;34mINFO[m] [1m--------------------------------[ jar ]---------------------------------[m [[1;34mINFO[m] [[1;34mINFO[m] [1m--- [0;32mmaven-compiler-plugin:3.1:compile[m m[m @ [36mdemo-infrastructure[0;1m ---[m [[1;34mINFO[m] Changes detected - recompiling the module! [[1;33mWARNING[m] File encoding has not been set, using platform encoding ANSI_X3.4-1968, i.e. build is platform dependent! [[1;34mINFO[m] Compiling 1 source file to /work/ace-demo-pipeline/demo-infrastructure/target/classes [[1;34mINFO[m] [[1;34mINFO[m] [1m--- [0;32mexec-maven-plugin:3.0.0:exec[m m[m @ [36mdemo-infrastructure[0;1m ---[m mqsicreateworkdir: Copying sample server.config.yaml to work directory 1 file(s) copied. Failed to open file /var/mqsi/registry/utility/HASharedWorkPath with error Permission denied BIP2113E: IBM App Connect Enterprise internal error: diagnostic information ''Permission denied'', '13', ''/var/mqsi/registry/utility/HASharedWorkPath''. An internal software error has occurred in IBM App Connect Enterprise. Further messages may indicate the effect of this error on the component. Shutdown and restart the component. If the problem continues to occur, then restart the system. If the problem still continues to occur contact your IBM support center. BIP8081E: An error occurred while processing the command. An error occurred while the command was running; the command has cleaned up and ended. Use messages prior to this one to determine the cause of the error. Check for some common problems: Does the user id have the correct authorities (for example a member of the mqbrkrs group)? Is any operating system limit set too low to allow the command to run? Is the environment correctly set up? Correct the problem and retry the command, otherwise, contact your IBM support center. [[1;31mERROR[m] Command execution failed. [1;31morg.apache.commons.exec.ExecuteException[m: [1;31mProcess exited with an error: 81 (Exit value: 81)[m [1mat[m org.apache.commons.exec.DefaultExecutor.executeInternal ([1mDefaultExecutor.java:404[m) [1mat[m org.apache.commons.exec.DefaultExecutor.execute ([1mDefaultExecutor.java:166[m) [1mat[m org.codehaus.mojo.exec.ExecMojo.executeCommandLine ([1mExecMojo.java:982[m) [1mat[m org.codehaus.mojo.exec.ExecMojo.executeCommandLine ([1mExecMojo.java:929[m) [1mat[m org.codehaus.mojo.exec.ExecMojo.execute ([1mExecMojo.java:457[m) [1mat[m org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo ([1mDefaultBuildPluginManager.java:137[m) [1mat[m org.apache.maven.lifecycle.internal.MojoExecutor.execute ([1mMojoExecutor.java:210[m) [1mat[m org.apache.maven.lifecycle.internal.MojoExecutor.execute ([1mMojoExecutor.java:156[m) [1mat[m org.apache.maven.lifecycle.internal.MojoExecutor.execute ([1mMojoExecutor.java:148[m) [1mat[m org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject ([1mLifecycleModuleBuilder.java:117[m) [1mat[m org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject ([1mLifecycleModuleBuilder.java:81[m) [1mat[m org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build ([1mSingleThreadedBuilder.java:56[m) [1mat[m org.apache.maven.lifecycle.internal.LifecycleStarter.execute ([1mLifecycleStarter.java:128[m) [1mat[m org.apache.maven.DefaultMaven.doExecute ([1mDefaultMaven.java:305[m) [1mat[m org.apache.maven.DefaultMaven.doExecute ([1mDefaultMaven.java:192[m) [1mat[m org.apache.maven.DefaultMaven.execute ([1mDefaultMaven.java:105[m) [1mat[m org.apache.maven.cli.MavenCli.execute ([1mMavenCli.java:957[m) [1mat[m org.apache.maven.cli.MavenCli.doMain ([1mMavenCli.java:289[m) [1mat[m org.apache.maven.cli.MavenCli.main ([1mMavenCli.java:193[m) [1mat[m sun.reflect.NativeMethodAccessorImpl.invoke0 ([1mNative Method[m) [1mat[m sun.reflect.NativeMethodAccessorImpl.invoke ([1mNativeMethodAccessorImpl.java:90[m) [1mat[m sun.reflect.DelegatingMethodAccessorImpl.invoke ([1mDelegatingMethodAccessorImpl.java:55[m) [1mat[m java.lang.reflect.Method.invoke ([1mMethod.java:508[m) [1mat[m org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced ([1mLauncher.java:282[m) [1mat[m org.codehaus.plexus.classworlds.launcher.Launcher.launch ([1mLauncher.java:225[m) [1mat[m org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode ([1mLauncher.java:406[m) [1mat[m org.codehaus.plexus.classworlds.launcher.Launcher.main ([1mLauncher.java:347[m) [[1;34mINFO[m] [1m------------------------------------------------------------------------[m [[1;34mINFO[m] [1mReactor Summary:[m