Closed Prasanna-543 closed 10 months ago
Hi @Prasanna-543
It seems that you are using aws-based authentication, have you followed the instructions in the readme to use aws auth? https://github.com/sighupio/gatekeeper-policy-manager#aws-iam-authentication
i haven't tried with aws-iam-authentication, when i check logs the error was ERROR:root:[Errno 2] No such file or directory: 'aws'
do we just need this part:
FROM curlimages/curl:7.81.0 as downloader RUN curl https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.5/aws-iam-authenticator_0.5.5_linux_amd64 --output /tmp/aws-iam-authenticator RUN chmod +x /tmp/aws-iam-authenticator FROM quay.io/sighup/gatekeeper-policy-manager:v1.0.3 COPY --from=downloader --chown=root:root /tmp/aws-iam-authenticator /usr/local/bin/
or should we need this need to be added to original image
do we need to add aws-cli too?
Correct.
From the kubeconfig you pasted it seems that it is configured to use the aws
command to authenticate, so you will need the aws cli for authenticating instead of the aws-iam-authenticator.
in other words, you will need to build your own image starting from gpm's image and including the aws cli binary. Another option would be modifying the kubeconfig to use another auth mechanism, but I don't know if that will be possible in your environment.
how do we add aws-cli? i found the below one in google :
FROM alpine:latest RUN apk --no-cache add python3 py3-pip RUN pip3 install --upgrade pip \ && pip3 install --no-cache-dir awscli
and added to the dockerfile in this repo and attached the image to Policy-manger deployment pod is showing crashloopBackoff:
gatekeeper-policy-manager-ui-5cc7545fb6-4n5gd 0/1 CrashLoopBackOff 1 (21s ago) 24s
when i describe pod: Warning BackOff 1s (x7 over 28s) kubelet Back-off restarting failed container
State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error
I can see the image and imageID, volumes and volumes-mounts, they are fine!
can you help me how to add aws-cli ?
The following Dockerfile should work:
FROM quay.io/sighup/gatekeeper-policy-manager:v1.0.3
# Add awscli to GPM image
USER root
WORKDIR /tmp
ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" /tmp
RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/*
RUN unzip awscli-exe-linux-x86_64.zip && aws/install && rm -rf aws && rm awscli-exe-linux-x86_64.zip
# Go back to the original image settings
WORKDIR /app
USER 999
Edit: changed USER gpm
to USER 999
Hey @Prasanna-543 were you able to make the custom image? Is it GPM working?
Regards,
The image was built successfully but Is there a possibility that building an image may change with different Architecture?
Sorry, I don't think I understand the question. The image will be different from the "official" one because you are adding stuff to it.
Could you please elaborate a little more? Is GPM not working with the built image?
Hi @ralgozino , I can build the image but facing below error Error: container has runAsNonRoot and image has non-numeric user (gpm), cannot verify user is non-root (pod: "gatekeeper-policy-manager-ui-d59d54b7f-bs6dg_gatekeeper-system(container: gatekeeper)
Hi @Prasanna-543 , change the line:
USER gpm
to
USER 999
in the Dockerfile and rebuild the image
yeah trying
when i select cluster in context this is the error and when i don't select any context below is the error
I have tried with new image, i checked the pod and logs too there no errors but UI is showing same
It seems that you have no selected context, see the dropdown at the top right? try choosing one context.
i have edited above reply! It just disappears(context) when i select
You should see some logs on the GPM pods about what is going on. If you still no see logs try the following 2 things please:
GPM_LOG_LEVEL
to DEBUG
, you should see more detailed logs in the pod with this change.I have the GPM_LOG_LEVEL is DEBUG before and now
[2023-04-21 08:13:41 +0000] [1] [INFO] Starting gunicorn 20.1.0 [2023-04-21 08:13:41 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1) [2023-04-21 08:13:41 +0000] [1] [INFO] Using worker: gthread [2023-04-21 08:13:41 +0000] [7] [INFO] Booting worker with pid: 7 [2023-04-21 08:13:41 +0000] [8] [INFO] Booting worker with pid: 8 [2023-04-21 08:13:44 +0000] [7] [INFO] gunicorn log level is set to: DEBUG [2023-04-21 08:13:44 +0000] [7] [INFO] application log level is set to: DEBUG [2023-04-21 08:13:44 +0000] [7] [INFO] RUNNING WITH AUTHENTICATION DISABLED [2023-04-21 08:13:44 +0000] [7] [INFO] Attempting init with KUBECONFIG from path '~/.kube/config' [2023-04-21 08:13:44 +0000] [7] [INFO] KUBECONFIG '~/.kube/config' successfuly loaded. [2023-04-21 08:13:44 +0000] [7] [DEBUG] GET /health [2023-04-21 08:13:44 +0000] [7] [DEBUG] GET /health [2023-04-21 08:13:44 +0000] [7] [DEBUG] GET /health [2023-04-21 08:13:44 +0000] [7] [DEBUG] Ignoring connection epipe [2023-04-21 08:13:44 +0000] [7] [DEBUG] Ignoring connection epipe [2023-04-21 08:13:44 +0000] [7] [DEBUG] Closing connection. [2023-04-21 08:13:44 +0000] [8] [INFO] gunicorn log level is set to: DEBUG [2023-04-21 08:13:44 +0000] [8] [INFO] application log level is set to: DEBUG [2023-04-21 08:13:44 +0000] [8] [INFO] RUNNING WITH AUTHENTICATION DISABLED [2023-04-21 08:13:44 +0000] [8] [INFO] Attempting init with KUBECONFIG from path '~/.kube/config' [2023-04-21 08:13:44 +0000] [8] [INFO] KUBECONFIG '~/.kube/config' successfuly loaded. [2023-04-21 08:13:45 +0000] [7] [DEBUG] GET / [2023-04-21 08:13:45 +0000] [8] [DEBUG] GET / [2023-04-21 08:13:45 +0000] [7] [DEBUG] Closing connection. [2023-04-21 08:13:45 +0000] [8] [DEBUG] Closing connection. ... [2023-04-21 08:14:01 +0000] [7] [DEBUG] GET /constraints/arn:aws:eks:*cluster [2023-04-21 08:14:01 +0000] [7] [DEBUG] GET /static/js/main.079229dd.js [2023-04-21 08:14:01 +0000] [7] [DEBUG] GET /static/css/main.e9dfd109.css [2023-04-21 08:14:02 +0000] [8] [DEBUG] GET /api/v1/contexts/ [2023-04-21 08:14:02 +0000] [8] [DEBUG] GET /api/v1/auth/ [2023-04-21 08:14:02 +0000] [7] [DEBUG] GET /static/media/github-logo.2384f056f07cd6da5d2a11e846a50566.svg [2023-04-21 08:14:02 +0000] [7] [DEBUG] GET /static/js/icon.heart.6a5439c3.chunk.js [2023-04-21 08:14:02 +0000] [7] [DEBUG] GET /static/js/icon.arrow_right.b4dff9f3.chunk.js [2023-04-21 08:14:02 +0000] [8] [DEBUG] GET /static/js/icon.popout.415e5814.chunk.js [2023-04-21 08:14:02 +0000] [8] [DEBUG] GET /static/media/Poppins-Medium.9e1bb626874ed49aa343.ttf [2023-04-21 08:14:02 +0000] [7] [DEBUG] GET /static/js/icon.arrow_down.64fbca8c.chunk.js [2023-04-21 08:14:02 +0000] [7] [DEBUG] GET /static/media/Poppins-Bold.404e299be26d78e66794.ttf [2023-04-21 08:14:03 +0000] [7] [DEBUG] GET /static/media/Poppins-Regular.8081832fc5cfbf634aa6.ttf [2023-04-21 08:14:03 +0000] [8] [DEBUG] GET /favicon.ico
and i'm using mac and i don't know how f12 works here
and i'm using mac and i don't know how f12 works here
Right click anywhere on the page -> Inspect. When the inspector opens up, go the constraints view for example, a check in "Network" tab of the inspector for requests in RED color. See what the response to that request is.
Another way to test is to with curl:
Test first the contexts endpoint (replace http://localhost:8080
with the address of GPM)
curl http://localhost:8080/api/v1/contexts/
You should see something like this:
[[{"name":"kind-kind","context":{"cluster":"kind-kind","user":"kind-kind"}}],{"name":"kind-kind","context":{"cluster":"kind-kind","user":"kind-kind"}}]
Then try the constraints
endpoint:
curl http://localhost:8080/api/v1/constraints/
And let me know what you get as a response.
curl https://GPM_host/api/v1/contexts/
curl https://GPM_host/api/v1/constraints/
the errors codes are 302 and 500
Mmmmm, there's something in your network that is messing up CORS breaking the frontend communication with the backend. Are you behind a corporate proxy or something similar?
Can you try the same curl commands but add the -L
flag? i.e.:
curl -L http://GPM_host/api/v1/contexts/
and
curl -L http://GPM_host/api/v1/constraints/
Let's see if you still get the CORS error with curl.
A workaround to see if you get it to work is to disable the CORS check in the backend, to do that add an environment variable APP_ENV
with the value development
.
yes i'm using company's laptop and needs vpn connection always! and many restrictions will be there!
Some questions:
Are you using OIDC?
Can you tell me what you see if you click on this request?
Is this request that has CORS problems being done to GPM or to another host? I don't remember GPM doing requests like that:
OIDC was not enabled in the values file but the service account included in deployment has iam role annotation where its has OIDC eks service and without multi-cluster with he same service-account, it worked fine
all other hosts are working fine, only GPM
Interesting
Please go to the Network
tab, then click on the request with error 500 and then on the Preview
tab, see:
OK, but the request that starts with authorize?...
to what host / URL is being done?
when i open one of the 500 request code i found this: {"action":"Please verify your kubeconfig file and location","description":"Invalid kube-config file. Expected object with name arn:aws:eks:region:iacc_id:cluster in /home/gpm/.kube/config/contexts list","error":"Can't connect to cluster due to an invalid kubeconfig file"}
even the cluster full-name is not loading
the request that starts with it's been pointing to manifest.json in above line name in the image
This is much better, the actual error is this:
Invalid kube-config file. Expected object with name arn:aws: eks:us-east-1:***:cluster in /home/gpm/.kube/config/contexts list'
I believe is caused by this bug in the Python Kubernetes client library: https://github.com/kubernetes-client/python/issues/1193
Could you please:
kubectl config current-contexts
or just inspect the file.@ralgozino
kubectl config current-context arn:aws:eks:us-east-1:ID:cluster/Cluster_name
and current-context has been set already
and do u think this will have any effect https://github.com/sighupio/gatekeeper-policy-manager/blob/ad6259d757d6e57920cbeaac9579a221f4ab5132/chart/values.yaml#L110
@ralgozino and can you remove 84**** number in the above comment, because i don't want it to be public!
done
and do u think this will have any effect
I don't think that is relevant to the problem.
Have you tried setting the env var APP_ENV
to development
? I'm going out of ideas.
not tried, i will try and let u know
its' not working!
Sorry to hear that. I'd need to try to replicate the issue myself, but I believe that is something particular to your environment. Probably the corporate VPN or an HTTP proxy in the middle changing something and breaking the front-end <> backend communication.
The last thing we can try is building the image from the main
branch, which has updated dependencies. Maybe we are lucky and the problem goes away.
Change the FROM quay.io/sighup/gatekeeper-policy-manager:v1.0.3
line in the Dockerfile to FROM quay.io/sighup/gatekeeper-policy-manager:bf1d36477f9291a06e7109b9193dbbe6546cbd37
and rebuild the image.
Hope that helps π€
this time error details was an extra part!
Yes!, we improved the error messages in the frontend a few days ago π
It keeps saying though that there's something wrong with the kubeconfig file. Are you 100% sure that the kubeconfig works? can you test it somehow?
The very last test we can do, is using the development version of GPM that changes the Python backend for a Go backend, we can discard issues with the Python library this way. To do so, change the FROM line like before to use this image instead: FROM quay.io/sighup/gatekeeper-policy-manager:20751b146c9093e7ca191b770671f7db869bf62d
If the Go backend has the same issue, I would not know what else to try π
Yes the kubeconfig works, because we are using locally the same one in (/.kube/config). I just copy pasted the cluster details and conttexts.
the image is getting build from above source you have given but it got build with previous oneπ FROM quay.io/sighup/gatekeeper-policy-manager:20751b146c9093e7ca191b770671f7db869bf62d
USER root WORKDIR /tmp ADD "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" /tmp RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/* RUN unzip awscli-exe-linux-x86_64.zip && aws/install && rm -rf aws && rm awscli-exe-linux-x86_64.zip
WORKDIR /app USER 999
[4/6] RUN apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/*:
8 0.309 runc run failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory
executor failed running [/bin/sh -c apt-get update && apt-get install -y unzip && rm -rf /var/lib/apt/lists/*]: exit code: 1
Is there a way that we can use local config file?
Yes the kubeconfig works, because we are using locally the same one in (/.kube/config). I just copy pasted the cluster details and conttexts.
What do you mean by this? Did you edit the kubeconfig file manually? please try with the same copy that you are using locally.
I forgot that the docker image for the Go version is starting from scratch (there's no OS) so that is why it is failing to build. For testing the go version of GPM with the aws
binary you can use the following Dockerfile:
FROM public.ecr.aws/amazonlinux/amazonlinux:2
ARG EXE_FILENAME=awscli-exe-linux-x86_64.zip
ADD "https://awscli.amazonaws.com/$EXE_FILENAME" /tmp
# COPY $EXE_FILENAME .
RUN yum update -y \
&& yum install -y unzip \
&& unzip /tmp/$EXE_FILENAME \
&& ./aws/install
COPY --from=quay.io/sighup/gatekeeper-policy-manager:20751b146c9093e7ca191b770671f7db869bf62d /app /app
WORKDIR /app
ENTRYPOINT ["./gpm"]
i have same issue, but i could make it work when i rename the context with kubectl
bash-4.2$ kubectl config rename-context arn:aws:eks:abcd:123456:cluster/cluster1
Context "arn:aws:eks:abcd:123456:cluster/cluster1" renamed to "cluster1".
so the url was looks like http://localhost:8080/constraints/arn:aws:eks:abcd:123456:cluster/cluster1 (Not found) and after rename it with kubectl
it will become http://localhost:8080/constraints/cluster1 and it become accessible, seems the url with arn is not working correctly.
thank you @kecebon9 ! this is great feedback. I think that URL-encoding the context name (or at least escaping the forward slashes /
) should fix the issue. I will do some tests.
Hello, we are using GPM where it's working for single cluster and when i set config to true i was facing below error in dashboard. and i even could see the context field in the web-page and can select the clusters .
Error Can't connect to cluster due to an invalid kubeconfig file Please verify your kubeconfig file and location
config looks like this: