hashicorp / faas-nomad

OpenFaaS plugin for Nomad
https://www.openfaas.com
MIT License
256 stars 45 forks source link

Gateway return function not found when run through traefik #66

Open numkem opened 5 years ago

numkem commented 5 years ago

What are the steps to reproduce this issue?

  1. Deploy using faas.hcl
  2. Add a function with UI or cli
  3. Try to get to the function through the gatway

What happens?

Yields a 404 Function not found

What were you expecting to happen?

To get results from the function

Any logs, error output, etc?

2018-12-30T23:34:15.224Z [INFO ] nomadd.consul_resolver: Getting Address from consul: function=certinfo
2018-12-30T23:34:15.225Z [ERROR] nomadd.proxy_client: Function Not Found: certinfo=<unknown>

Any other comments?

My nomad cluster is pre-built on node that runs CoreOS. HTTP proxy is done with traefik through the consul_catalog provider within traeifk. Consul services are exported by their name to a specific domain.

I'm not totally sure on how all the pieces works together but the functions are launched through nomad without issues. I also can reach the function itself through traefik because it's service was registered in consul. Trying to use the function through the /function/<name> with the gateway doesn't work.

What versions of software are you using?

Operating System: CoreOS

OpenFaaS Version: latest with current master of this repo

Nomad Version: 0.8.6

alexellis commented 5 years ago

@acornies

acornies commented 5 years ago

@numkem Can you provide a little more info on how you're invoking the functions? Also, can you provide the gateway env vars and fass-nomad cli params?

The way faas-nomad works currently is that it handles the proxying to functions using it's own internal consul resolver using watches.

alexellis commented 5 years ago

+1 I think the Traefik config would help too.

numkem commented 5 years ago

Configuration

This is the traefik configuration, it uses consul's catalog to map a services based on their name to the domain.

        defaultEntryPoints = ["http", "https"]
        [api]
        entryPoint = "traefik"
        dashboard = true

        [consulCatalog]
        endpoint = "10.1.1.2:8500"
        exposedByDefault = true
        stale = true
        domain = "svc.domain.int"
        prefix = "traefik"

This means that a service with the name echo would be bound to echo.svc.domain.int.

As for the gateway configuration, I currently have this:

functions_provider_url="http://{{ env "NOMAD_IP_http" }}:8081/"
{{ range service "metric" }}
faas_prometheus_host="{{ .Address }}"
faas_prometheus_port="{{ .Port }}"{{ end }}
{{ range service "nats" }}
faas_nats_address="{{ .Address }}"
faas_nats_port={{ .Port }}{{ end }}

I had to change 2 things from faas.hcl:

  1. Rename the service prometheus to metric since I already have an install of prometheus running (it works, i see the stats from the functions into graphana)
  2. Remove the static bind of the gateway since traefik takes care of it.
  3. Rename the gateway service to faas

Running the function

Since I was starting to play with openfaas, I only used the ui through traefik http://faas.svc.domain.int to get to the UI than adding some functions (certinfo, nodeinfo for example).

Now if I try to reach them through http://faas.svc.domain.int/function/certinfo I get a 404 error with Function not found. But I can see the function being started through nomad status and it's also in consul as a service named certinfo.

Since the service is registered in consul I can reach it through traefik at http://certinfo.svc.domain.int and it works just as it's supposed to.

Thank you!

acornies commented 5 years ago

Hmm, I'm not sure this is what you want since all functions should be invoked through the gateway -> faas-nomad. This looks like you're bypassing the gateway/provider altogether (after the function is deployed) and invoking the functions directly using consul FQDNs with traefik. This will break some behaviour with OpenFaaS - namely auto-scaling.

Can you also please post the faas-nomad provider config?

numkem commented 5 years ago

I know that using traefik to invoke the function isn't the right way to do it. I was just saying that to show that the function works without problems outside from the gateway issue.

This is the configuration for faas-nomad:

      "-nomad_region", "${NOMAD_REGION}",
      "-nomad_addr", "${NOMAD_IP_http}:4646",
      "-consul_addr", "${NOMAD_IP_http}:8500",
      "-statsd_addr", "${NOMAD_ADDR_statsd_statsd}",
      "-node_addr", "${NOMAD_IP_http}",
      "-basic_auth_secret_path", "/secrets",
      "-enable_basic_auth=false"
acornies commented 5 years ago

Right - just want to be clear. Can you also share some info on your consul configuration? Having trouble reproducing this locally.

@nicholasjackson any ideas?

numkem commented 5 years ago

My consul setup is nothing fancy except I activated the DNS resolver.

How is the lookup done in consul? Is it looking for a specific name or tag?

acornies commented 5 years ago

The lookup is by service name (function name) 1:1 mapping.

What you could also try is using the direct_functions feature on the gateway component (you renamed it to faas), like so:

direct_functions="true"
direct_functions_suffix="svc.domain.int"

This should resolve requests using the consul domain (FQDN) you've setup but still collect metrics via watchdog. You also need to set the dns_servers nomad/docker option for the gateway container pointing to the consul or traefik DNS address.