openfaas / faas-netes

Serverless Functions For Kubernetes
https://www.openfaas.com
MIT License
2.13k stars 472 forks source link

Function never autoscales #390

Closed jpds closed 4 years ago

jpds commented 5 years ago

Expected Behaviour

I'm configuring a function with:

apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
  annotations:
    ...
  creationTimestamp: "2019-03-06T12:45:32Z"
  generation: 1
  labels:
    ...
  name: my-function
  namespace: openfaas-fn
  resourceVersion: "2775351"
  selfLink: /apis/openfaas.com/v1alpha2/namespaces/openfaas-fn/functions/my-function
  uid: ba9c3237-400d-11e9-9e16-064c8cd93004
spec:
  environment:
    ...
    write_debug: "true"
  image: 756893155009.dkr.ecr.eu-west-1.amazonaws.com/my-function:master-1cd411e
  labels:
    com.openfaas.scale.factor: "100"
    com.openfaas.scale.max: "6"
    com.openfaas.scale.min: "3"
  limit:
    cpu: 1
    memory: 100Mi
  name: my-function
  requests:
    cpu: 10m
    memory: 64Mi

However, I never see this run with more than one replica, even when setting the factor to 100 as I have. My logs are read:

openfaas/gateway-7847d555b9-x99lv[operator]: I0318 16:05:02.281364       1 deployment.go:229] Change detected for my-function diff
openfaas/gateway-7847d555b9-x99lv[operator]: (*{v1alpha2.FunctionSpec}.Labels)["com.openfaas.scale.max"]:
openfaas/gateway-7847d555b9-x99lv[operator]:    -: <non-existent>
openfaas/gateway-7847d555b9-x99lv[operator]:    +: "6"
openfaas/gateway-7847d555b9-x99lv[operator]: I0318 16:05:02.281384       1 controller.go:367] Updating deployment for 'my-function'
openfaas/gateway-7847d555b9-9gwwx[operator]: I0318 16:05:02.280411       1 deployment.go:229] Change detected for my-function diff
openfaas/gateway-7847d555b9-9gwwx[operator]: (*{v1alpha2.FunctionSpec}.Labels)["com.openfaas.scale.max"]:
openfaas/gateway-7847d555b9-9gwwx[operator]:    -: <non-existent>
openfaas/gateway-7847d555b9-9gwwx[operator]:    +: "6"
openfaas/gateway-7847d555b9-9gwwx[operator]: I0318 16:05:02.280433       1 controller.go:367] Updating deployment for 'my-function'
openfaas/gateway-7847d555b9-x99lv[operator]: E0318 16:05:02.304867       1 controller.go:390] Updating service for 'my-function' failed: Operation cannot be fulfilled on services "my-function": the object has been modified; please apply your changes to the latest version and try again
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:02 Forwarded [GET] to /system/function/my-function - [200] - 0.003291 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:02 Forwarded [GET] to /healthz - [200] - 0.000501 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:02 Forwarded [GET] to /healthz - [200] - 0.004913 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:03 Forwarded [GET] to /healthz - [200] - 0.000719 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:05 Forwarded [GET] to /system/function/my-function - [200] - 0.004401 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:05 Forwarded [GET] to /system/functions - [200] - 0.003903 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:08 Forwarded [GET] to /system/function/my-function - [200] - 0.007761 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:08 Forwarded [GET] to /healthz - [200] - 0.000743 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:09 Forwarded [GET] to /system/functions - [200] - 0.003594 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:11 Forwarded [GET] to /system/function/my-function - [200] - 0.006474 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:12 Forwarded [GET] to /healthz - [200] - 0.003696 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:12 Forwarded [GET] to /healthz - [200] - 0.003298 seconds
openfaas/gateway-7847d555b9-x99lv[gateway]: 2019/03/18 16:05:13 Forwarded [GET] to /system/functions - [200] - 0.003901 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:13 Forwarded [GET] to /healthz - [200] - 0.000450 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:14 Forwarded [GET] to /system/function/my-function - [200] - 0.003677 seconds
openfaas/gateway-7847d555b9-9gwwx[gateway]: 2019/03/18 16:05:16 Forwarded [GET] to /system/functions - [200] - 0.003351 seconds
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: {{200 my-function} [1.552925116586e+09 0]}
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: {{502 my-function} [1.552925116586e+09 0]}
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 2019/03/18 16:05:16 Skip: certinfo due to missing label
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 2019/03/18 16:05:16 Skip: my-function due to missing label

I do not know what this missing label is.

Your Environment

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.7", GitCommit:"65ecaf0671341311ce6aea0edab46ee69f65d59e", GitTreeState:"clean", BuildDate:"2019-01-24T19:22:45Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}
stefanprodan commented 5 years ago

Can you please remove com.openfaas.scale.factor: "100" and try without it? If you want to test auto scaling you should run a load test.

jpds commented 5 years ago

@stefanprodan Even without the factor, I still only have one min pod.

jpds commented 5 years ago

Here's the Flux config I use to deploy openfaas:

---
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: openfaas
  namespace: openfaas
  annotations:
    flux.weave.works/automated: "true"
spec:
  releaseName: openfaas
  chart:
    repository: https://openfaas.github.io/faas-netes/
    name: openfaas
    version: 1.8.1
  values:
    basic_auth: true
    exposeServices: false
    functionNamespace: "openfaas-fn"
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: nginx
        nginx.ingress.kubernetes.io/proxy-body-size: 10m
        nginx.ingress.kubernetes.io/proxy-connect-timeout: 60
        nginx.ingress.kubernetes.io/proxy-send-timeout: 60
        nginx.ingress.kubernetes.io/proxy-read-timeout: 60
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        certmanager.k8s.io/cluster-issuer: "letsencrypt-prod"
        certmanager.k8s.io/acme-challenge-type: http01
      hosts:
      - host: fn.mydomain.io
        path: /
        serviceName: gateway
        servicePort: 8080
      tls:
      - hosts:
        - fn.mydomain.io
        secretName: openfaas-tls
    gateway:
      replicas: 2
    operator:
      create: true
    queueWorker:
      replicas: 2
    serviceType: ClusterIP
stefanprodan commented 5 years ago

You are using an old chart version, the latest is 2.1.2, try this:

  chart:
    repository: https://openfaas.github.io/faas-netes/
    name: openfaas
    version: 2.1.2
stefanprodan commented 5 years ago

After the upgrade to 2.1.2 please delete the functions and redeploy them, just to start a fresh test.

jpds commented 5 years ago

@stefanprodan Thanks. I upgraded and redeployed the function with just the label: min: "3". I still only have one replica and logs of:

openfaas/gateway-58899f65bd-478zd[operator]: I0318 16:47:02.024766       1 controller.go:331] Creating deployment for 'my-function'
openfaas/gateway-58899f65bd-c5vt4[operator]: I0318 16:47:02.024621       1 controller.go:331] Creating deployment for 'my-function'
openfaas/gateway-58899f65bd-478zd[operator]: I0318 16:47:02.039694       1 controller.go:340] Creating ClusterIP service for 'my-function'
openfaas/gateway-58899f65bd-c5vt4[operator]: I0318 16:47:02.049003       1 controller.go:340] Creating ClusterIP service for 'my-function'
openfaas/gateway-58899f65bd-478zd[operator]: E0318 16:47:02.053761       1 controller.go:283] error syncing 'openfaas-fn/my-function': deployment.apps "my-function" not found
openfaas/gateway-58899f65bd-c5vt4[operator]: I0318 16:47:02.091877       1 controller.go:344] ClusterIP service 'my-function' already exists. Skipping creation.
openfaas/gateway-58899f65bd-c5vt4[operator]: E0318 16:47:02.091904       1 controller.go:283] error syncing 'openfaas-fn/my-function': deployment.apps "my-function" not found
....
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 2019/03/18 16:47:17 Skip: certinfo due to missing label
openfaas/faas-idler-647969d86f-jgm4n[faas-idler]: 2019/03/18 16:47:17 Skip: my-function due to missing label
jpds commented 5 years ago

Flux config for the function:

apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
  name: my-function
  namespace: openfaas-fn
spec:
  name: my-function
  image: my-ecr-url/my-function:master-githash
  labels:
    com.openfaas.scale.min: "4"
  environment:
    ...
    write_debug: "true"
  requests:
    cpu: "10m"
    memory: "64Mi"
stefanprodan commented 5 years ago

Ok I've checked the operator code and looks like the min replica is missing from the CRD loop. Please open an issue in here https://github.com/openfaas-incubator/openfaas-operator

stefanprodan commented 5 years ago

As a workaround you can set the min replicas like this

apiVersion: openfaas.com/v1alpha2
kind: Function
metadata:
  name: my-function
  namespace: openfaas-fn
spec:
  replicas: 4
jpds commented 5 years ago

Filed https://github.com/openfaas-incubator/openfaas-operator/issues/74 - and thank you very much for the replicas: workaround @stefanprodan!

alexellis commented 4 years ago

/close

alexellis commented 4 years ago

/lock: closing as inactive. Feel free to raise a new issue if this is still required