ot4i / ace-demo-pipeline

Demo pipeline for App Connect Enterprise
MIT License
20 stars 54 forks source link

Permission issue - Failed to open file HASharedWorkPath #4

Open ddsat opened 3 years ago

ddsat commented 3 years ago

Hi, I'm trying your approach but facing below issue at step maven-build of task maven-ace-build (tekton pipeline, ocp 4.6). Any idea what may be missing from your script and how to resolve it please. Thanks

Compiling log [INFO] ---------------< ace-demo-pipeline:demo-infrastructure >---------------- [INFO] Building demo-infrastructure 0.0.1 [6/7] [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- maven-compiler-plugin:3.1:compile m @ demo-infrastructure --- [INFO] Changes detected - recompiling the module! [WARNING] File encoding has not been set, using platform encoding ANSI_X3.4-1968, i.e. build is platform dependent! [INFO] Compiling 1 source file to /work/ace-demo-pipeline/demo-infrastructure/target/classes [INFO] [INFO] --- exec-maven-plugin:3.0.0:exec m @ demo-infrastructure --- mqsicreateworkdir: Copying sample server.config.yaml to work directory 1 file(s) copied. Failed to open file /var/mqsi/registry/utility/HASharedWorkPath with error Permission denied BIP2113E: IBM App Connect Enterprise internal error: diagnostic information ''Permission denied'', '13', ''/var/mqsi/registry/utility/HASharedWorkPath''. An internal software error has occurred in IBM App Connect Enterprise. Further messages may indicate the effect of this error on the component. Shutdown and restart the component. If the problem continues to occur, then restart the system. If the problem still continues to occur contact your IBM support center. BIP8081E: An error occurred while processing the command. An error occurred while the command was running; the command has cleaned up and ended. Use messages prior to this one to determine the cause of the error. Check for some common problems: Does the user id have the correct authorities (for example a member of the mqbrkrs group)? Is any operating system limit set too low to allow the command to run? Is the environment correctly set up? Correct the problem and retry the command, otherwise, contact your IBM support center. [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 81 (Exit value: 81) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:982) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:929) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:457) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289) at org.apache.maven.cli.MavenCli.main (MavenCli.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:90) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke (Method.java:508) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary:

tdolby-at-uk-ibm-com commented 3 years ago

This looks like it might be related to userids and permissions; it's entirely possible that OCP 4.6 is randomising the userid (or group memberships) in some way that breaks the Maven script.

To confirm this, it's probably worth running the "id" and "whoami" commands from the script that runs Maven, as I suspect it may not be a member of the mqbrkrs group. As for fixing the problem, I think the easiest way to fix it would be to run "chmod -R 777 /var/mqsi" as root in the container Dockerfile, as that would certainly eliminate the permissions issues.

It's also possible to set the MQSI_REGISTRY env var to somewhere else that's just been created with mqsicreateworkdir; I had to do this when running buildah with RedHat's s2i, and the script is here: https://github.com/tdolby-at-uk-ibm-com/ace-s2i-demo/blob/main/ace-build.sh

ddsat commented 3 years ago

thank you @tdolby-at-uk-ibm-com. I tried to set serviceaccount that executes pipelinerun as pipeline and added it to group that can RunAsAny. Yet still the same.

For chmod you meant rebuilding tdolby/experimental:pipeline-travis-build-maven?

It's now issue in the below step and it fails when excecute mqsicreateworkdir in maven script under demo-infrastructure. Can you suggest a simple change in script for it to work (according to current script in this project) name: maven-build image: tdolby/experimental:pipeline-travis-build-maven script: |

!/bin/bash

    export LICENSE=accept
    . /opt/ibm/ace-11/server/bin/mqsiprofile
    mkdir /work/maven-output
    cd /work/ace-demo-pipeline
    mvn -Dinstall.work.directory=/work/maven-output/ace-server install
  volumeMounts:
    - mountPath: /work
      name: work
tdolby-at-uk-ibm-com commented 3 years ago

Yes, you'd have to build another container on top of the original one, with a chmod in the Dockerfile.

However, it seems possible that if you add

export MQSI_REGISTRY=/tmp/perms-work-dir/config
mqsicreateworkdir /tmp/perms-work-dir
export MQSI_WORKPATH=$MQSI_REGISTRY

just before the mkdir /work/maven-output line, then it should work. This is what I had to do with buildah in the s2i repo, and so it seemed like a good bet.

Even though I can't recreate your setup, I've tested this solution locally by deliberately making /var/mqsi unreadable and running Maven; originally, it showed the same error as you see, and after those three lines it started working.

ddsat commented 3 years ago

Thanks for your good bet. It works. Some other issues with TeaTests job but I can skip that part.

Got another issue with permission at the next step. Could rectify it please? STEP-DOCKER-BUILD-AND-PUSH Error: error resolving dockerfile path: copying dockerfile: open /kaniko/Dockerfile: permission denied

One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

tdolby-at-uk-ibm-com commented 3 years ago

Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).

From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to https://github.com/GoogleContainerTools/kaniko/issues/681 where the issue has been closed after some investigation.

They suggest using securityContext: runAsUser: 0 (as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.

While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.

tdolby-at-uk-ibm-com commented 3 years ago

One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.

However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.

I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.

ddsat commented 3 years ago

One more thing. I notice you use deployment.yaml and service.yaml to deploy the image. If my environment already contains Platform Navigator Dashboard and App Connect Dashboard, should it work too? 'Cause it seems I just need to deploy an integration server (from image packaged with bar files) Any advice on this please?

In theory, you should be able to run the custom image using the IntegrationServer operator and you shouldn't need to run the deployment/service/route files I'm using.

However, the custom image produce by the pipeline in this repo doesn't build on top of the certified container (because that container is too big to fit in the IBM Cloud Container Registry's free tier of 512MB) and so I fear you'll have some issues trying to go down this path.

I don't think it would be hard to change that, though I haven't tried it myself. You'd need to change tekton/Dockerfile and adjust the BASE_IMAGE passed to kaniko so that it points to the certified container, and it should be possible to put the application into the usual /home/aceuser/ace-server work directory that way so it starts automatically.

Really appreciate your help.

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod

Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?

With my environment context, do you think I still can go down this path?

ddsat commented 3 years ago

Glad to hear the initial fix works (though ignoring tests might be unwise in the long term!).

From a quick search, it looks like OpenShift with random userids doesn't work with kaniko. It seems that Google and RedHat have different ideas about how this should work, according to GoogleContainerTools/kaniko#681 where the issue has been closed after some investigation.

They suggest using securityContext: runAsUser: 0 (as is also suggested here https://stackoverflow.com/questions/60911478/permission-denied-for-the-kaniko-job-in-openshift-cluster ) at which point presumably the permissions problems go away, so that's definitely worth a try.

While this demo is really aimed at public cloud IKS (and others), OpenShift ought to work outside the Code-Ready Container install that I run locally to try things out. I'll have to try and get buildah working again (broken during my upgrade to ACE v12) along with s2i and see where that gets us.

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

tdolby-at-uk-ibm-com commented 3 years ago

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod

Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard?

With my environment context, do you think I still can go down this path?

Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.

You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.

tdolby-at-uk-ibm-com commented 3 years ago

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 0

as the solution.

ddsat commented 3 years ago

Thanks. the service account i'm using already is RunAsAny. RunAsUser: 0 you meant put it in the running pod?

I think that's what they meant, yes: I don't have an easy way to test this, but the Stack Overflow link shows someone suggesting

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 0

as the solution.

the running pod is gcr.io/kaniko-project/executor:v0.16.0 Can you suggest the change in script. I'll try to run it

ddsat commented 3 years ago

in UI with Platform Navigator and App Connect Dashboard, I'm using image cp.icr.io/cp/appc/ace-server-prod Would it be possible if I use this as a BASE_IMAGE in tekton/Docker file while still running the same maven-ace-build task? And instead of running deployment.yaml, i will just need to prepare yaml of creating integration server with new image. service.yaml isn't necessary is it? Because integration server yaml is the only I need if I create it from App Connect Dashboard? With my environment context, do you think I still can go down this path?

Yes, I think that has a good chance of succeeding, though it's possible more modifications will be needed to the Dockerfile if the certified container has anything in a different location or whatever.

You should certainly be able to use a single yaml file for the IntegrationServer, and not require the deployment or service yaml files, as the operator should create everything you need.

I'll give it a try after I can manage to execute both tasks in this pipeline successful.

tdolby-at-uk-ibm-com commented 3 years ago

After a bit of effort, I managed to break my local OpenShift cluster the same way yours broke, and I believe what's needed is a different pipelinerun. Instead of the ace-pipeline-run-crc.yaml as-is, I think the following should work:

apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: ace-pipeline-run-1
spec:
  serviceAccountName: ace-tekton-service-account
  pipelineRef:
    name: ace-pipeline
  podTemplate:
    securityContext:
      runAsNonRoot: false
      runAsUser: 0
  params:
    - name: dockerRegistry
      value: "image-registry.openshift-image-registry.svc:5000/default"

which should set the pod to running as root, which should in turn allow kaniko to operate successfully. The YAML changes came from https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#specifying-a-pod-template and appear to be completely standard.

This seemed to work for me (after I made sure the docker credentials in "regcred" were present and correct) and hopefully will work for you and not fall foul of any restrictions on running pods as root . . .

ddsat commented 3 years ago

Thanks @tdolby-at-uk-ibm-com for your detailed suggestions. I'll try it today.

ddsat commented 3 years ago

I've updated securityContext and secret for my image registry in the serviceaccount pipeline that i'm using (highlighted in bold). It seems you've updated ace minimal version to ace-minimal-build-12.0.1.0-alpine too. Both tasks of ace-demo-pipeline works successfully thanks to your suggestion.

However deployment yaml file can't successfully create the pod. I'm not sure if it's because of my environment with certified image Platform Navigator instance and certified ACE Dashboard image (11.0.0.11-r2) or not. Will try to look more into this.

$ oc get serviceaccount pipeline NAME SECRETS AGE pipeline 2 12d $ oc get serviceaccount pipeline -o yaml apiVersion: v1 imagePullSecrets: