openfaas / of-watchdog

Reverse proxy for STDIO and HTTP microservices
MIT License
262 stars 115 forks source link

Support request on scaling from zero #92

Closed t83714 closed 4 years ago

t83714 commented 4 years ago

If you invoke a function (use of-watchdog HTTP mode) after it's been scaled to zero, a new function pod has been successfully created but get Server returned unexpected status code: 500 - result.

If you look into the function pod log below:

Forking - node [bootstrap.js]
2020/03/10 11:23:19 Started logging stderr from function.
2020/03/10 11:23:19 Started logging stdout from function.
2020/03/10 11:23:19 OperationalMode: http
2020/03/10 11:23:19 Timeouts: read: 1m5s, write: 1m5s hard: 1m0s.
2020/03/10 11:23:19 Listening on port: 8080
2020/03/10 11:23:19 Writing lock-file to: /tmp/.lock
2020/03/10 11:23:19 Metrics listening on port: 8081
2020/03/10 11:23:21 Upstream HTTP request error: Post http://127.0.0.1:3000/: dial tcp 127.0.0.1:3000: connect: connection refused
2020/03/10 11:23:27 stdout: OpenFaaS Node.js listening on port: 3000

you can tell of-watch throw an Upstream HTTP request error 6 seconds before the express Node.js server started to listen at port 3000.

Expected Behaviour

When invoking a function that has been scaled to zero, the function needs to be successfully executed and return the correct response.

Extra response time due to the pod initialisation is acceptable.

Current Behaviour

When invoking a function that has been scaled to zero, a Server returned unexpected status code: 500 - response is returned.

Possible Solution

Possible Solution 1:

Possible Solution 2:

Steps to Reproduce (for bugs)

  1. Turn on the scale to zero feature of faas-idler and deploy the function with com.openfaas.scale.zero=true label
  2. Wait 30 mins and make sure the function pod is terminated by the faas-idler
  3. invoke the function and watch the result

Context

This issue makes scale to zero feature unusable as the first request after scaled to zero will always fail.

Your Environment

CLI:
 commit:  ea687659ecf14931a29be46c4d2866899d36c282
 version: 0.11.8

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.18.10
 sha:     80b6976c106370a7081b2f8e9099a6ea9638e1f3
 commit:  Update Golang versions to 1.12

Provider
 name:          openfaas-operator
 orchestration: kubernetes
 version:       0.14.1 
 sha:           e747b6ace86bc54184d899fa10cf46dada331af1
alexellis commented 4 years ago

Hi in order to look at this we will need to know your complete set-up and how we can reproduce the issue. I can't actually reproduce this so I would think you may have missed some key information?

alexellis commented 4 years ago

/set title: Support request on scaling from zero

alexellis commented 4 years ago

/add label: support

t83714 commented 4 years ago

@alexellis Thanks for your reply. Here is the repo of our project (Branch openfaas): https://github.com/magda-io/magda-ckan-connector/tree/openfaas

I tried to create a customised docker image to suit our project. You don't have to build the project as the docker image is available from docker hub. We deploy it like (helm tpl is here):

apiVersion: openfaas.com/v1
kind: Function
metadata:
  name: ckan-data-url-processor
  namespace: "openfaas-fn"
  labels: 
    magdaCategory: connectors
    magdaSubCategory: ckan-connector
    magdaType: data-url-processor
spec:
  name: ckan-data-url-processor
  handler: node bootstrap.js
  image: "docker.io/data61/magda-ckan-connector:0.0.57-0-openfaas"
  labels: 
    magdaCategory: connectors
    magdaSubCategory: ckan-connector
    magdaType: data-url-processor
    com.openfaas.scale.zero: "true"
  environment:
    handler: remoteDataUrl
    read_timeout: 65s
    write_timeout: 65s
    exec_timeout: 60s
    userId: 00000000-0000-4000-8000-000000000000

If you do need to build the docker image locally for testing, you can run the following in the cloned repo:

yarn install
yarn build
# assume you use minikube --- setup docker env vars
eval $(minikube docker-env)
yarn docker-build-local

To invoke the function

POST the following:

{
  "handler": "remoteDataUrl",
  "remoteDataUrl": "https://demo.ckan.org/dataset/databaza-ziadosti-o-dotacie-mk-sr-za-rok-2019_test2020_02_18_14_50"
}

to http://localhost:8080/function/ckan-data-url-processor

Assume you have port-forwarded openfaas gateway to localhost:8080

The function simply pulls data from a public available API, convert it to JSON data and return it.

Please let me know if you need more information to reproduce the issue. Thanks again!

t83714 commented 4 years ago

@alexellis Not familiar with GO but quick browsing the code and looks like _/health endpoint will return 200 immediately once of-watchdog's HTTP server starts to listen?

See the line below. The acceptingConnections has been set to 1 here.

https://github.com/openfaas-incubator/of-watchdog/blob/dd56bb916862d17f72d52d82469c631713bdd414/main.go#L146

After that, _/health endpoint will always return 200 code.

At this stage, the executor HTTP server may not be ready yet (although buildRequestHandler is called earlier).

Could we probably add some logic to forward HTTP request to executor HTTP server's _health endpoint at here after acceptingConnections==1 && lockFilePresent() both true?

This will give us a chance to make sure executor HTTP server is online before serve any function invoke request.

alexellis commented 4 years ago

How long does it take until your function server is ready?

alexellis commented 4 years ago

Would you be open to trying out a custom health check path? https://docs.openfaas.com/reference/workloads/

It should be proxied to your function code.

t83714 commented 4 years ago

4 to 6 seconds. From the logs, I can tell the of-watchdog's HTTP server is online and start to serve _health request within 1 second.

t83714 commented 4 years ago

ou be open to trying out a custom health check path

Thanks a lot! Will have a try the custom health check path and report back 👍

t83714 commented 4 years ago

@alexellis

I've added a /_/health endpoint to my express executor:

app.get("/_/health", (req, res) => {
    res.status(200).send("OK");
});

And delete the function and re-deploy the function with com.openfaas.health.http.path: "/_/health" annotations:

apiVersion: openfaas.com/v1
kind: Function
metadata:
  name: ckan-data-url-processor
  namespace: openfaas-fn
  labels: 
    magdaCategory: connectors
    magdaSubCategory: ckan-connector
    magdaType: data-url-processor
spec:
  name: ckan-data-url-processor
  handler: node bootstrap.js
  image: "localhost:5000/data61/magda-ckan-connector:latest"
  labels: 
    magdaCategory: connectors
    magdaSubCategory: ckan-connector
    magdaType: data-url-processor
    com.openfaas.scale.zero: "true"
  annotations:
    com.openfaas.health.http.path: "/_/health"
  environment:
    handler: remoteDataUrl
    read_timeout: 65s
    write_timeout: 65s
    exec_timeout: 60s
    userId: 00000000-0000-4000-8000-000000000000
  limits:
    cpu: 100m
  requests:
    cpu: 50m
    memory: 30Mi

However, seems still got the same error. The invocation request was received after 5s but the express server was up after 10s:

Forking - node [bootstrap.js]
2020/03/12 12:49:51 Started logging stderr from function.
2020/03/12 12:49:51 Started logging stdout from function.
2020/03/12 12:49:51 OperationalMode: http
2020/03/12 12:49:51 Timeouts: read: 1m5s, write: 1m5s hard: 1m0s.
2020/03/12 12:49:51 Listening on port: 8080
2020/03/12 12:49:51 Writing lock-file to: /tmp/.lock
2020/03/12 12:49:51 Metrics listening on port: 8081
2020/03/12 12:49:55 Upstream HTTP request error: Post http://127.0.0.1:3000/: dial tcp 127.0.0.1:3000: connect: connection refused
2020/03/12 12:50:00 stdout: OpenFaaS Node.js listening on port: 3000

Seems custom health check path didn't work. Is it something to do with how I setup the custom health check path?

alexellis commented 4 years ago

You need to use a different path otherwise you'll conflict with the watchdog

t83714 commented 4 years ago

Set com.openfaas.health.http.path to a different path works! I do need to set com.openfaas.health.http.initialDelay: "10s" as well to avoid container being restarted. Thanks a lot! @alexellis 👍

alexellis commented 4 years ago

What was your typical start up time for your container?

Given your usage, you may benefit from joining the Insiders Program, from 5USD/mo. https://openfaas.com/support

t83714 commented 4 years ago

I think the typical container start-up time is around 10s. A closer look at it I found the long startup time may cause by the require() in NodeJs. We use require() to load a lot of JSON files from another NPM package ---- it might be the culprit that causes the delay.

Regarding the support service, we just start to look at FaaS solutions and evaluate the suitability of Openfaas for our project. Our project is an open-source project (i.e. we won't make any money from it and probably won't have the budget for the support service ). But I think if we do adopt Openfaas as part of our solution, many of our users will be interested in the support service. I'm also quite happy to contribute to the project via feedback/bugs report or even some coding (can do NodeJs & Scala and keen to pick up some GO)

Thanks again @alexellis for your help & well done on the Openfaas project 👍

alexellis commented 4 years ago

Our project is an open-source project (i.e. we won't make any money from it and probably won't have the budget for the support service ).

Just so that you're aware, openfaas is also an open source project which has no external funding. I didn't link you to a support program, but a landing page which has the newsletter subscription - since you are using OpenFaaS (it seems heavily), the least you can do is check it out and read the sample. If you benefited from me giving you free technical support here (which you did), then you will benefit from the weekly email updates on the project and ecosystem.