openfaas / faas-netes

Serverless Functions For Kubernetes
https://www.openfaas.com
MIT License
2.12k stars 473 forks source link

Unable to deploy a function with the Operator in K8s 1.18 #630

Closed flamedmg closed 4 years ago

flamedmg commented 4 years ago

My actions before raising this issue

Function can't be updated on kubernetes 1.17.5 (latest in DO). Update fails with the message:

Unexpected status: 500, message: functions.openfaas.com "function-name" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

Freshly installed openfaas, ingress controller, cert manager. Installed function with faas up -f function.yaml Right after deployment, deploying one more time to update and getting error message from above.

Expected Behaviour

Function should be properly deployed

Current Behaviour

Deployment fails

Possible Solution

Steps to Reproduce (for bugs)

  1. Install faas via helm3 chart
  2. Install nginx ingress controller
  3. Install cert manager
  4. Deploy any function, i used python3-http
  5. Redeploy, it will fail Basicly i was following the following guide: https://docs.openfaas.com/reference/ssl/kubernetes-with-cert-manager/

Context

I'm happy to provide access to this test cluster if required

Your Environment

Next steps

You may join Slack for community support.

alexellis commented 4 years ago

Hi there, thanks for your interest.

We're going to need some more information before we can help you out with this.

alexellis commented 4 years ago

You should also provide the output asked for on the troubleshooting guide and everything under "Your Environment"

Just edit your issue, no need to paste a comment.

So that's fill out Your Environment accurately And paste in the output from all of the K8s troubleshooting steps in: https://docs.openfaas.com/deployment/troubleshooting/#troubleshoot-kubernetes

Thanks :)

alexellis commented 4 years ago

ube

Unable to reproduce any error using default instructions with K8s 1.18. I wonder if we're missing some important info or context from your setup?

kind create cluster
GO111MODULE="on" go get sigs.k8s.io/kind@v0.8.0 

arkade install openfaas

# Follow login and port-forwarding step

faas new --lang go --prefix alexellis2 go1

faas up -f go1.yml
curl -d test http://127.0.0.1:8080/function/go1
faas deploy go1.yml
curl -d test http://127.0.0.1:8080/function/go1
flamedmg commented 4 years ago

I'm sorry, attaching requested below:

kubectl apply -f https://raw.githubusercontent.com/openfaas/faas-netes/master/namespaces.yml
namespace/openfaas created
namespace/openfaas-fn created
helm repo update \
 && helm upgrade openfaas --install openfaas/openfaas \
    --namespace openfaas  \
    --set functionNamespace=openfaas-fn \
    --set operator.create=true \
    --set generateBasicAuth=true
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/do/deploy.yaml
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.0/cert-manager.yaml
helm upgrade openfaas \
    --namespace openfaas \
    --reuse-values \
    --values tls.yaml \
    openfaas/openfaas

After certificate is deployed i'm adding new function:

version: 1.0
provider:
  name: openfaas
  gateway: https://gw.myhost.com
functions:
  function-name:
    lang: python3-http
    handler: ./function-name
    image: docker.pkg.github.com/corp/repo/function-name:latest
    secrets:
      - docker-github

Handler code:

def handle(event, context):
    return {
        "statusCode": 404,
        "body": {
            "status": "not found",
        }
    }
alexellis commented 4 years ago

I see you're using the operator and didn't mention that in your initial report.

How about just not using the operator? That should fix it for you. In the meantime we should now have what we need to try to reproduce the issue and I'll ping the community.

alexellis commented 4 years ago

With 1.18 I couldn't deploy anything with the operator (let alone update), but faas-netes functions well without any changes.

alex@alexx:/tmp$ faas-cli list
Function                        Invocations     Replicas
alex@alexx:/tmp$ kubectl get function -A
No resources found
alex@alexx:/tmp$ kubectl get pod -n openfaas-fn
No resources found in openfaas-fn namespace.
alex@alexx:/tmp$ 

latest: digest: sha256:608e17865401b8eec4cc79b5c3e3b717b9d58779651a016ddb722e72f3ded81c size: 1577
[0] < Pushing go1 [alexellis2/go1:latest] done.
[0] Worker done.

Deploying: go1.
WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.

Unexpected status: 500, message: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1"

Function 'go1' failed to deploy with status code: 500
alex@alexx:/tmp$ faas-cli list
Function                        Invocations     Replicas
alex@alexx:/tmp$ 
alexellis commented 4 years ago

/set title: Unable to deploy a function with the Operator in K8s 1.18

alexellis commented 4 years ago

Repro for 1.18 is deploy anything.

Get arkade from https://github.com/alexellis/arkade/releases

Get KinD from https://github.com/kubernetes-sigs/kind

kind create cluster
GO111MODULE="on" go get sigs.k8s.io/kind@v0.8.0 

arkade install openfaas --operator

# Forward the gateway to your machine
kubectl rollout status -n openfaas deploy/gateway
kubectl port-forward -n openfaas svc/gateway 8080:8080 &

# If basic auth is enabled, you can now log into your gateway:
PASSWORD=$(kubectl get secret -n openfaas basic-auth -o jsonpath="{.data.basic-auth-password}" | base64 --decode; echo)
echo -n $PASSWORD | faas-cli login --username admin --password-stdin

faas-cli store deploy figlet
WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.

Unexpected status: 500, message: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1"

Function 'figlet' failed to deploy with status code: 500

From the logs:

kubectl logs -n openfaas deploy/gateway -c operator
I0507 18:59:57.063776       1 main.go:153] Starting operator. Version: 0.10.3   commit: 964780acb68ab0fa16a376ba9cd55bd752e4bea4
W0507 18:59:57.064031       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020/05/07 18:59:57 Waiting for cache sync in main
2020/05/07 18:59:57 Cache sync done
I0507 18:59:57.066240       1 controller.go:111] Setting up event handlers
I0507 18:59:57.066461       1 server.go:104] Using namespace 'openfaas-fn'
I0507 18:59:57.066495       1 controller.go:154] Waiting for informer caches to sync
I0507 18:59:57.066626       1 server.go:119] Starting HTTP server on port 8081
I0507 18:59:57.166730       1 controller.go:159] Starting workers
I0507 18:59:57.166764       1 controller.go:165] Started workers
E0507 19:03:10.954075       1 apply.go:68] Function go1 update error: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1"
E0507 19:11:22.556258       1 apply.go:68] Function figlet update error: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1"
alexellis commented 4 years ago

I believe the code invoked lives here -> https://github.com/openfaas/faas-netes/blob/master/pkg/server/apply.go

faas-cli -> HTTP -> gateway -> faas-netes (in operator mode) -> apply.go

alexellis commented 4 years ago

Perhaps the CRD validation got tighter and fails in 1.18 and 1.17.5?

Status:
  Accepted Names:
    Kind:       Function
    List Kind:  FunctionList
    Plural:     functions
    Short Names:
      fn
    Singular:  function
  Conditions:
    Last Transition Time:  2020-05-07T18:59:19Z
    Message:               [spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[annotations].anyOf[0].type: Forbidden: must be empty to be structural, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[annotations].anyOf[1].type: Forbidden: must be empty to be structural, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[annotations].type: Required value: must not be empty for specified object fields, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[constraints].items: Required value: must be specified, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[labels].anyOf[0].type: Forbidden: must be empty to be structural, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[labels].anyOf[1].type: Forbidden: must be empty to be structural, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[labels].type: Required value: must not be empty for specified object fields, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[limits].type: Required value: must not be empty for specified object fields, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[requests].type: Required value: must not be empty for specified object fields, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[secrets].items: Required value: must be specified, spec.versions[0].schema.openAPIV3Schema.properties[spec].type: Required value: must not be empty for specified object fields, spec.versions[0].schema.openAPIV3Schema.type: Required value: must not be empty at the root, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[annotations].anyOf[0].type: Forbidden: must be empty to be structural, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[annotations].anyOf[1].type: Forbidden: must be empty to be structural, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[annotations].type: Required value: must not be empty for specified object fields, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[constraints].items: Required value: must be specified, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[labels].anyOf[0].type: Forbidden: must be empty to be structural, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[labels].anyOf[1].type: Forbidden: must be empty to be structural, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[labels].type: Required value: must not be empty for specified object fields, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[limits].type: Required value: must not be empty for specified object fields, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[requests].type: Required value: must not be empty for specified object fields, spec.versions[1].schema.openAPIV3Schema.properties[spec].properties[secrets].items: Required value: must be specified, spec.versions[1].schema.openAPIV3Schema.properties[spec].type: Required value: must not be empty for specified object fields, spec.versions[1].schema.openAPIV3Schema.type: Required value: must not be empty at the root]
    Reason:                Violations
    Status:                True
    Type:                  NonStructuralSchema
    Last Transition Time:  2020-05-07T18:59:19Z
    Message:               no conflicts found
    Reason:                NoConflicts
    Status:                True
    Type:                  NamesAccepted
    Last Transition Time:  2020-05-07T18:59:19Z
    Message:               the initial names have been accepted
    Reason:                InitialNamesAccepted
    Status:                True
    Type:                  Established
  Stored Versions:
    v1
Events:  <none>

I.e. Forbidden: must be empty to be structural, spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[annotations].anyOf[1].type: Forbidden: must be empty to be structural

Our CRD definition: https://github.com/openfaas/faas-netes/blob/master/chart/openfaas/templates/crd.yaml

flamedmg commented 4 years ago

Confirming, open-faas works just fine without operator mode!

alexellis commented 4 years ago

My sense is that we have an issue with the CRD schema validation which one resolved may fix 1.17.5, 1.18 may need additional changes, but I'm not sure. We are using the 1.17 version of client-go, so I'd expect it work out the box with 1.18.

1.18 notes - https://kubernetes.io/docs/setup/release/notes/ 1.17 notes - https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#changelog-since-v1174

alexellis commented 4 years ago

Reading this post

As you can see also logical constraints using oneOf, allOf, anyOf, not are allowed.

I wonder if it is related?

liggitt commented 4 years ago

The messages added to status surface issues with your schema that prevent it from being treated as a structural schema. That prevents openapi from being published for your type, and would prevent you from creating this CRD via the v1 API, but would not by itself prevent update of the v1beta1 CRD or submission of custom resources.

The anyOf part sounds like it might be related to https://github.com/kubernetes/kubernetes/issues/85127#issuecomment-605890538 in which server-side-apply aspects of the request handler make use of schema info, and appear to have issues with some complex schemas that are not structural. https://github.com/kubernetes/kubernetes/pull/90656 is in progress to handle schemas like that more gracefully.

liggitt commented 4 years ago

The 500 error with "managed fields: failed to convert live object to proper version" is related to server-side-apply, which enabled managed fields by default in 1.18. Can you open an issue in kubernetes/kubernetes specifically around this error? If there's a reproducer that can be seen with just CRD and CR manifests and kubectl, that would be ideal if possible.

alexellis commented 4 years ago

When creating a function via kubectl instead of the REST API, it created the resource the first time, when there were no labels

apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
  name: figlet
  namespace: openfaas-fn
spec:
  name: figlet
  image: functions/figlet:0.13.0

After:

apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
  name: figlet
  namespace: openfaas-fn
spec:
  name: figlet
  image: functions/figlet:0.13.0
  labels:
    alex: yes

Then when updating to have labels it gave this error in a loop:

E0507 21:41:22.217223   17355 reflector.go:153] pkg/mod/k8s.io/client-go@
v0.17.4/tools/cache/reflector.go:105: Failed to list *v1.Function: v1.Fun
ctionList.Items: []v1.Function: v1.Function.Spec: v1.FunctionSpec.Labels:
 ReadString: expects " or n, but found t, error found in #10 byte of ...|:{"alex":true},"name|..., bigger context ...|mage":"functions/figlet:0.13.0","labels":{"alex":true},"name":"figlet"}}],"kind":"FunctionList","met|...
E0507 21:41:23.221735   17355 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.4/tools/cache/reflector.go:105: Failed to list *v1.Function: v1.FunctionList.Items: []v1.Function: v1.Function.Spec: v1.FunctionSpec.Labels: ReadString: expects " or n, but found t, error found in #10 byte of ...|:{"alex":true},"name|..., bigger context ...|mage":"functions/figlet:0.13.0","
liggitt commented 4 years ago

Then when updating to have labels it gave this error in a loop:

I think that's a red herring... "alex":true isn't a valid label... it needs to be quoted, e.g. "alex":"true"

alexellis commented 4 years ago

You are right :facepalm: Let me verify it.

alexellis commented 4 years ago

That is a red herring as you suspected @liggitt

# kubectl apply quoted label value
I0507 20:49:55.866090       1 controller.go:254] Creating deployment for 'figlet'
I0507 20:49:55.958536       1 controller.go:266] Creating ClusterIP service for 'figlet'

# kubectl apply quoted label value with a second label value
I0507 20:50:18.204932       1 controller.go:295] Updating deployment for 'figlet'

# deploy via the REST API (linked above in apply.go)
E0507 20:50:26.393631       1 apply.go:68] Function nodeinfo update error: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1"

# update the figlet function via the REST API (linked above in apply.go)
E0507 20:50:31.286874       1 apply.go:68] Function figlet update error: functions.openfaas.com "figlet" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

What's a good place to start debugging this?

alexellis commented 4 years ago

I raised #631 to get around the error on metadata.resourceVersion. @stefanprodan if you have time, can you comment on why this code was doing an update, then checking for a 404, before doing a create? It seems like this isn't supported in the latest version of K8s unless there's an extra flag that's now needed.

openfaas/faas-netes:0.10.5-3 contains the patch @flamedmg if you wanted to try it out? arkade install openfaas --set operator.image=openfaas/faas-netes:0.10.5-3 --operator

liggitt commented 4 years ago

What's a good place to start debugging this?

Seeing the incoming request to the API server would be helpful. Enabling request body logging in audit events for this resource would enable that. See https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#audit-policy for details on enabling audit logging at Request level for a specific resource.

What version of client-go was the client doing the API updates built with, and what API server version(s) is this observed with?

LucasRoesler commented 4 years ago

I suspect that this is relevant https://github.com/kubernetes/kubernetes/issues/70674#issuecomment-436737939

Other projects had the same error for example https://github.com/elastic/cloud-on-k8s/issues/2200 and https://github.com/banzaicloud/terraform-provider-k8s/issues/25

I suspect that we need to try to Get the Function first then use Update if it is found, otherwise use Create.

I agree with @alexellis and we can fix this in https://github.com/openfaas/faas-netes/blob/1a7bcd976caa50ee6f4640f7f747d2b455dbecfc/pkg/server/apply.go#L52-L73

alexellis commented 4 years ago

I put this together last night and tested it. Not sure why the linking didn't put a nice pop-out box on the issue.

https://github.com/openfaas/faas-netes/pull/631

liggitt commented 4 years ago

The Invalid value: 0x0: must be specified for an update message is because resourceVersion is required when PUTing updates to custom resources. That has always been the case for custom resources. Reproduced on 1.15:

curl -k -X PUT https://localhost:6443/apis/example.com/v1/namespaces/default/foos/test \
  -H "Content-Type: application/json" \
  --data '{"kind":"Foo","apiVersion":"example.com/v1","metadata":{"name":"test"}}'
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "foos.example.com \"test\" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update",
  "reason": "Invalid",
  "details": {
    "name": "test",
    "group": "example.com",
    "kind": "foos",
    "causes": [
      {
        "reason": "FieldValueInvalid",
        "message": "Invalid value: 0x0: must be specified for an update",
        "field": "metadata.resourceVersion"
      }
    ]
  },
  "code": 422
}

The update error: failed to update object (Update for openfaas.com/v1, Kind=Function) managed fields: failed to convert live object to proper version: /, Kind= is unstructured and is not suitable for converting to "openfaas.com/v1" error is definitely not expected. Definitely open an issue in kubernetes/kubernetes with whatever reproducer steps you have available.

liggitt commented 4 years ago

I created the CRD manually in a 1.18 API server, then created/updated/patched Function API objects via the v1 and v1alpha2 APIs a lot of different ways, and was not able to reproduce the failed to update object 500 error... do you have any more details about the exact update request made that triggers that?

alexellis commented 4 years ago

The original poster appears to be on k8s 1.17.5 which may or may not have a bearing on it

alexellis commented 4 years ago

@flamedmg please can you try again with the latest release?