hashicorp / faas-nomad

OpenFaaS plugin for Nomad
https://www.openfaas.com
MIT License
254 stars 46 forks source link

Scale up from zero does not work on Nomad #59

Closed hxalid closed 5 years ago

hxalid commented 5 years ago

What are the steps to reproduce this issue?

  1. Provision OpenFaaS on an existing Nomad cluster and set scale_from_zero=true in the OpenFaaS gateway:
task "openfaas-gateway" {
      driver = "docker"
      template {
        env = true
        destination   = "secrets/gateway.env"

        data = <<EOH
             functions_provider_url="http://{{ env "NOMAD_IP_http" }}:8081/"
             {{ range service "openfaas-nats" }}
             faas_nats_address="{{ .Address }}"
             faas_nats_port={{ .Port }}{{ end }}
             EOH
      }

      config {
        image = "<my-registry>/openfaas/gateway:latest-dev"

        port_map {
          http = 8080
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 128 # 128MB

        network {
          mbits = 10

          port "http" {
            static = 8080
          }
        }
      }
      service {
        port = "http"
        name = "openfaas-gateway"
        tags = ["v0.3.2",
                "openfaas"
                ]
      }

      env {
        faas_prometheus_host="my-prometheus-gateway"
        faas_prometheus_port=80
        scale_from_zero=true
      }
  }
  1. Deploy a simple function
provider:
  name: faas
  gateway: https://<my-gateway-url>

functions:
  echoer:
    lang: go
    handler: ./echoer
    image: <my-docker-registery>/faas-echoer:1.0.0
    labels:
      datacenters: "dev"
      com.openfaas.scale.min: 2
      com.openfaas.scale.max: 10
      com.openfaas.scale.factor: 20
      com.openfaas.scale.zero: true
    environment:
      read_debug: true
      write_debug: true
      print_response: true
  1. What happens?

    echoer starts with 1 replication and after short time scales down to zero. Any subsequent requests returns Function Not Found:

curl --data "message" https://faas-gateway/function/echoer

What were you expecting to happen?

The number of replications would go up from zero to 1

Any logs, error output, etc?

2018/11/17 12:16:28 Version: 0.9.6  SHA: e33061702a8fc4d55bf8a8b7ba2f5dce87089c14
2018/11/17 12:16:28 Read/write timeout: 5s, 5s. Port: 8080
2018/11/17 12:16:28 Writing lock-file to: /tmp/.lock
2018/11/17 12:17:41 Forking fprocess.
2018/11/17 12:17:41 Query  
2018/11/17 12:17:41 Path  /
2018/11/17 12:17:41 Duration: 0.005614 seconds
2018/11/17 12:18:08 SIGTERM received.. shutting down server in 5s
2018/11/17 12:18:08 Removing lock-file : /tmp/.lock
2018/11/17 12:18:13 No new connections allowed. Exiting in: 5s

Any other comments?

What versions of software are you using?

Operating System: … Ubuntu 16.04 OpenFaaS Version: … Latest OpenFaaS built from source Nomad Version: … Nomad v0.8.6

hxalid commented 5 years ago

I've found the reason why scaling from zero was not working on Nomad. It was apparently, due to the fact that when true is not used as a string in the Nomad manifest as an environment variable value, it is being interpreted as "1" by Nomad. However, OpenFaaS Gateway expects scale_from_zero environment variable as a string of value "true". So, changing scale_from_zero=true to scale_from_zero="true" fixed my problem.

However, I am observing some other intermittent behaviour. Namely, when a Nomad job is in pending status after scaling from zero to one, it still returns Function Not Found. I'm going to close this issue and will open another issue to explain that intermittent behaviour more if I cannot figure out what is happening.

nicholasjackson commented 5 years ago

Thanks @hxalid The Funciton Not Found message is by design as pending status means that the function can not yet accept requests. Maybe this needs to be changed from a generic 404 to another status code however I would need to think about how this would work, the function execution is not aware of Nomad status only the Service Catalog and a function is only registered here once healthy.