Closed mjallday closed 4 years ago
It all seems to be working as designed? What's the ask here?
Sorry, it may not be clear. The running pod is from another function that I manually created via the ui.
There should be a deployment with a pod for the function I created via the operator but it’s not being created.
The running pod is called node info which is one of the built in functions.
I’m trying to debug why my pod is not being launched.
I expect I would see a pod/deployment named my-fn
with an image quay.io/verygoodsecurity/my-fn:latest
running.
On Fri, Apr 17, 2020 at 00:05 Alex Ellis notifications@github.com wrote:
It all seems to be working as designed? What's the ask here?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openfaas/faas-netes/issues/614#issuecomment-615081039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5S6FDL26JYA3PN3BASFDRM75UPANCNFSM4MKMWVSQ .
-- Marshall Jones
looking at logs from pods
kubectl -n openfaas logs -f gateway-58548dc44d-7md2h -c faas-netes
W0417 15:18:37.183131 1 reflector.go:326] k8s.io/client-go/informers/factory.go:135: watch of *v1.Endpoints ended with: too old resource version: 592310 (592584)
W0417 15:43:35.194190 1 reflector.go:326] k8s.io/client-go/informers/factory.go:135: watch of *v1.Endpoints ended with: too old resource version: 594372 (595259)
W0417 16:00:54.204263 1 reflector.go:326] k8s.io/client-go/informers/factory.go:135: watch of *v1.Endpoints ended with: too old resource version: 597052 (597118)
this looks unrelated. when i deploy a new function i don't see any log output from any of the pods.
is there a particular pod i can watch to see what the operator is doing?
@stefanprodan should be able to help with this. Can you please copy/paste "Your Environment" into this issue at the top too?
Thanks, here's the definition of the function btw.
cat fn.yaml
---
apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
name: my-fn
spec:
name: my-fn
image: quay.io/verygoodsecurity/my-fn:latest
I've tried a few others. All have similar results - the function resource gets created but no pods or deployments are created.
There are no events on the function resource so can't tell why it's not being applied. Creating them via the API or OpenFaaS UI works just fine.
kubectl -n openfaas describe pod gateway-58548dc44d-7md2h
Name: gateway-58548dc44d-7md2h
Namespace: openfaas
Priority: 0
Node: ip-10-14-99-10.us-west-2.compute.internal/10.14.99.10
Start Time: Thu, 16 Apr 2020 10:25:42 -0700
Labels: app=gateway
pod-template-hash=58548dc44d
Annotations: kubernetes.io/psp: eks.privileged
prometheus.io.port: 8082
prometheus.io.scrape: true
Status: Running
IP: 10.14.101.201
Controlled By: ReplicaSet/gateway-58548dc44d
Containers:
gateway:
Container ID: docker://ba9da345b6e1a9890e7f88adf68fcd6aafb5873d039c217c2f1b82b7f58ae4d5
Image: openfaas/gateway:0.18.13-arm64
Image ID: docker-pullable://openfaas/gateway@sha256:a5ebb0005d623c81b8e8b47a89dd5f90139ecc2bcff83dc8ea93281b94024d45
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 16 Apr 2020 10:25:45 -0700
Ready: True
Restart Count: 0
Requests:
cpu: 50m
memory: 120Mi
Liveness: http-get http://:8080/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
Environment:
read_timeout: 65s
write_timeout: 65s
upstream_timeout: 60s
functions_provider_url: http://127.0.0.1:8081/
direct_functions: true
direct_functions_suffix: openfaas-fn.svc.cluster.local
function_namespace: openfaas-fn
faas_nats_address: nats.openfaas.svc.cluster.local
faas_nats_port: 4222
faas_nats_channel: faas-request
basic_auth: true
secret_mount_path: /var/secrets
auth_proxy_url: http://basic-auth-plugin.openfaas:8080/validate
auth_pass_body: false
scale_from_zero: true
max_idle_conns: 1024
max_idle_conns_per_host: 1024
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from openfaas-openfaas-controller-token-fcjbf (ro)
/var/secrets from auth (ro)
faas-netes:
Container ID: docker://5120a9e59cb1d253ccc02043b9f6c8e324f490f921ca4ff1fdd842507683d923
Image: openfaas/faas-netes:0.10.2-arm64
Image ID: docker-pullable://openfaas/faas-netes@sha256:5d3323092ec536df47de651c65aea3db09d1798e3b30a6ee2a965e37a28bf72c
Port: 8081/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 16 Apr 2020 10:25:49 -0700
Ready: True
Restart Count: 0
Requests:
cpu: 50m
memory: 120Mi
Environment:
port: 8081
function_namespace: openfaas-fn
read_timeout: 60s
write_timeout: 60s
image_pull_policy: Always
http_probe: true
set_nonroot_user: false
readiness_probe_initial_delay_seconds: 2
readiness_probe_timeout_seconds: 1
readiness_probe_period_seconds: 2
liveness_probe_initial_delay_seconds: 2
liveness_probe_timeout_seconds: 1
liveness_probe_period_seconds: 2
Mounts:
/tmp from faas-netes-temp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from openfaas-openfaas-controller-token-fcjbf (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
faas-netes-temp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
auth:
Type: Secret (a volume populated by a Secret)
SecretName: basic-auth
Optional: false
openfaas-openfaas-controller-token-fcjbf:
Type: Secret (a volume populated by a Secret)
SecretName: openfaas-openfaas-controller-token-fcjbf
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=arm64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Does it use auth?
What are the logs of the controller and what events are in the namespace?
I wonder if your EKS version of v1.14 isn't supported by the client-go 1.17 version we updated to recently? @stefanprodan @LucasRoesler
Does it use auth?
Auth only exists on the UI. This is mostly deployed using the helm chart specified in https://github.com/openfaas/faas-netes/commit/77851960b31b980f0328d55fd0f8c2b168bac8b7
The only customization I've done is add the ARM64 values. Here's the exact HelmRelease i'm applying
---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
name: openfaas
namespace: openfaas
annotations:
flux.weave.works/automated: "false"
spec:
chart:
git: https://github.com/openfaas/faas-netes
path: chart/openfaas
ref: 77851960b31b980f0328d55fd0f8c2b168bac8b7
values:
ingress:
enabled: false
functionNamespace: openfaas-fn
generateBasicAuth: true
istio:
mtls: false
basic_auth: true
# arm64 specific values
gateway:
image: openfaas/gateway:0.18.13-arm64
directFunctions: true
replicas: 1
oauth2Plugin:
enabled: false
faasnetes:
image: openfaas/faas-netes:0.10.2-arm64
httpProbe: true
operator:
image: openfaas/faas-netes:0.10.2-arm64
create: false
queueWorker:
image: openfaas/queue-worker:0.9.0-arm64
prometheus:
image: prom/prometheus:v2.11.0
create: true
resources:
requests:
memory: "125Mi"
alertmanager:
image: prom/alertmanager:v0.18.0
create: true
faasIdler:
image: openfaas/faas-idler:0.3.0-arm64
basicAuthPlugin:
image: openfaas/basic-auth-plugin:0.18.13-arm64
replicas: 1
ingressOperator:
create: false
nodeSelector:
beta.kubernetes.io/arch: arm64
What are the logs of the controller and what events are in the namespace?
I've looked at the output from the gateway pod. Nothing interesting there (only health check and ui access logs turn out, no other output). Happy to supply logs from other pods but also don't see anything relevant, just tell me what you'd like to see.
No events on the function itself. It's like the operator isn't even processing it.
The question about auth is about whether your quay.io image is using auth, you redacted the image name, why?
Can you provide a values.yaml instead of a FluxCD config? Flux isn't supported on ARM64 unless people build their own images or use community images, and I'm not going to do that.
Ah, I see. I’d say it’s working because it works if I pull the image using the ui.
Let me confirm tho by using a public image. Will update shortly.
Did you setup a proper configuration for the image pull secrets? https://docs.openfaas.com/deployment/kubernetes/#use-a-private-registry-with-kubernetes
Also, did you run kubectl get events -n openfaas-fn
?
Did you setup a proper configuration for the image pull secrets? https://docs.openfaas.com/deployment/kubernetes/#use-a-private-registry-with-kubernetes
Yes, but for clarity here's a new function with a public image to remove that from concern:
cat fn2.yaml
---
apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
name: fn-2
spec:
name: fn-2
image: functions/nodeinfo:arm64
kubectl -n openfaas-fn apply -f fn2.yaml
function.openfaas.com/fn-2 created
kubectl -n openfaas-fn get functions
NAME AGE
fn-2 13s
my-fn 15h
kubectl get events -n openfaas-fn
LAST SEEN TYPE REASON OBJECT MESSAGE
21m Normal Scheduled pod/fn-manual-7657c974b6-dhptv Successfully assigned openfaas-fn/fn-manual-7657c974b6-dhptv to ip-10-14-100-161.us-west-2.compute.internal
21m Normal Pulling pod/fn-manual-7657c974b6-dhptv Pulling image "quay.io/verygoodsecurity/test-fn:latest"
21m Normal Pulled pod/fn-manual-7657c974b6-dhptv Successfully pulled image "quay.io/verygoodsecurity/test-fn:latest"
21m Normal Created pod/fn-manual-7657c974b6-dhptv Created container fn-manual
21m Normal Started pod/fn-manual-7657c974b6-dhptv Started container fn-manual
21m Normal SuccessfulCreate replicaset/fn-manual-7657c974b6 Created pod: fn-manual-7657c974b6-dhptv
21m Normal ScalingReplicaSet deployment/fn-manual Scaled up replica set fn-manual-7657c974b6 to 1
34s Normal ChartSynced helmrelease/functions Chart managed by HelmRelease processed
(fn-manual is something i created via the UI to ensure i'm not going crazy here, everything works just fine when i'm deploying via the ui)
helmrelease/functions is a HelmRelease object that's rendering a series of functions btw. this is something we use successfully on our x86 cluster.
What architecture is your image that you're creating? Are you building it on an ARM64 device?
it's arm64. that's why i switched to the nodeinfo so that we don't need to worry about the image that i'm building. i've already deployed and successfully run the nodeinfo (arm tagged version) on this cluster.
see here you can see if i manually deploy it it's in the running state.
This works fine for me:
kubectl get deploy,pod,function,service -n openfaas-fn
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nodeinfo 1/1 1 1 6m7s
deployment.apps/fn-2 1/1 1 1 41s
NAME READY STATUS RESTARTS AGE
pod/nodeinfo-6dbd6bfc98-2qxcj 1/1 Running 0 6m7s
pod/fn-2-5bb7b8d977-pxrws 1/1 Running 0 41s
NAME AGE
function.openfaas.com/nodeinfo 6m7s
function.openfaas.com/fn-2 41s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nodeinfo ClusterIP 10.43.52.60 <none> 8080/TCP 6m7s
service/fn-2 ClusterIP 10.43.135.167 <none> 8080/TCP 41s
And kubectl version
shows a more modern version:
kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:20:10Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2+k3s1", GitCommit:"cdab19b09a84389ffbf57bebd33871c60b1d6b28", GitTreeState:"clean", BuildDate:"2020-01-27T18:08:16Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/arm64"}
My suspicion is that you have something misconfigured, or the latest update to Go client 1.17 broke compatibility with EKS on 1.14.
. Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-f459c0", GitCommit:"f459c0672169dd35e77af56c24556530a05e9ab1", GitTreeState:"clean", BuildDate:"2020-03-18T04:24:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Perhaps you should try upgrading EKS to 1.15?
https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
The Kubernetes compatibility matrix implies that the 1.17 go-client doesn't work with 1.14, but 1.15: https://github.com/kubernetes/client-go#compatibility-matrix
GKE also has 1.15 available so it seems like the minimum version you should target?
https://cloud.google.com/kubernetes-engine/docs/release-notes#no-channel
looks like there might be an arm eks release for 1.15 so we'll look at launching that cluster and see if that's where the problem lies and report back.
thanks for verifying everything looks ok
What's the usecase here? Is it commercial?
it's a commercial use-case. we're running image classification software that's optimized for ARM. in this particular use-case it's running tensorflow.
just to update: we deployed the same setup on our x86 cluster and aren't seeing any issues (not on eks tho so not exact parity). we tried upgrading eks arm to 1.15 but doesn't look like that version is compatible yet.
we'll update once we have any progress to share on this.
We managed to tweak the config and successfully upgraded the cluster. Everything is now working as expected. I will close this issue.
Thanks for the help in debugging, glad we got it solved!
kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Expected Behaviour
Following instructions on how to apply a function on ARM OpenFaaS cluster.
Current Behaviour
Function CRD is created but do not see related resources created.
I expect to see the function created here. There are no events attached to "my-fn"
Running via the CLI (openfaas-cli) or webui works fine.
Steps to Reproduce (for bugs)
Context
Your Environment
faas-cli version
):CLI: commit: 2d183c713b32385831dc7f69c073e57c06e3b76c version: 0.12.2
kubectl version Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-f459c0", GitCommit:"f459c0672169dd35e77af56c24556530a05e9ab1", GitTreeState:"clean", BuildDate:"2020-03-18T04:24:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}