openfaas / of-watchdog

Reverse proxy for STDIO and HTTP microservices
MIT License
262 stars 115 forks source link

JVM never receives SIGTERM (shutdownhook never called) #33

Open hejfelix opened 6 years ago

hejfelix commented 6 years ago

Expected Behaviour

I would expect the JVM to receive SIGTERM and then terminate. I am running a HTTP4S scala webserver in this project: https://github.com/hejfelix/fp-exercises-and-grading/blob/master/http4s_faas/openfaas/Dockerfile

Current Behaviour

JVM never shuts down before docker container is killed

Possible Solution

Not sure, it seems like watchdog is not forwarding the shutdown hook?

Steps to Reproduce (for bugs)

Add a shutdown hook to any function running in http mode

Context

I want to be able to clean up resources, e.g. database connections, unfinished operations, etc.

Your Environment

Running on docker swarm on my macbook pro

Client: Docker Engine - Community
 Version:           18.09.0-ce-beta1
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        78a6bdb
 Built:             Thu Sep  6 22:41:53 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0-ce-beta1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       78a6bdb
  Built:            Thu Sep  6 22:49:35 2018
  OS/Arch:          linux/amd64
  Experimental:     true
alexellis commented 6 years ago

Hi thanks for the question. You get a graceful shutdown period built-in. This gives any HTTP requests a grace period to shut down (you should be able to read about this in the readme)

The period is connected to write_timeout which I see you may not be specifying correctly at present. A Golang duration is needed i.e. 20s.

alexellis commented 6 years ago

Derek add label: question

hejfelix commented 6 years ago

What I'm thinking about is a hook whenever a container is taken out of commission (e.g. scale to 0, function removed, etc.). Having a warm JVM means that collecting/releasing resources on each invocation seems a bit silly, so it would be nice to know when the resources MUST be released.

hejfelix commented 6 years ago

Are you saying it does indeed forward SIGTERM to the fprocess?

alexellis commented 5 years ago

Hi @hejfelix I would be curious as to what Lambda, Google Functions or Azure Functions do in this scenario. If they do pass on SIGTERM then we should investigate it.

Alex

alexellis commented 5 years ago

Kubernetes has a rather complicated shutdown procedure regarding health-checks which does not make things easy. cc @LucasRoesler @stefanprodan

hejfelix commented 5 years ago

My experience with AWS Lambda was that there was no shutdown hook. This essentially meant that I wasted a lot of time acquiring and releasing resources on every invocation. This defeats the purpose of having a warm function.

LucasRoesler commented 5 years ago

There are a couple of things that happen during shutdown in a k8s pod

  1. the Pod is marked as terminated/deleted, this also removes it from various Endpoints and Services in k8s
  2. the preStopHook is executed, this allows the Pod to respond to and customize the stop behavior. But ... this is not something we support configuration of
  3. SIGTERM is sent to the Pod
  4. when the grace period expires, SIGKILL is sent. The grace period defaults to 30 seconds and can be customized on the Pod spec via terminationGracePeriodSeconds

By far the simplest thing we could do is to listen for and send the SIGTERM to the child process. Which should already be happening (this is the relevant chunk of code that sends the TERM to the fprocess in http mode)

In short, a quick review of the code looks like it is trying to send the TERM signal to the fprocess, in this case the JVM.

hejfelix commented 5 years ago

Interesting -- I am running on docker swarm in all my tests so far. Is that supposed to behave the same way? I guess I'm struggling to find a way to verify that my shutdown hook is run since nothing appears in docker service logs after node is taken down. Any ideas?

LucasRoesler commented 5 years ago

Probably the simplest way to verify it is to send a message somewhere, e.g. to RequestBin . If you are certain that your shutdown hook runs on SIGTERM and If you don't get a message, then that means you didn't get a SIGTERM

hejfelix commented 5 years ago

Right, so I'm not getting any shutdown hook. This is my scala code:

  Runtime.getRuntime().addShutdownHook(new Thread(){
    override def run(): Unit = Source.fromURL(s"http://requestbin.fullcontact.com/wrh63owr/SHUTDOWN_HOOK_SUCK_IT_LAMBDA_${Random.nextInt}").toList
  } )

If I run this line e.g. in the REPL, it works:

Source.fromURL(s"http://requestbin.fullcontact.com/wrh63owr/SHUTDOWN_HOOK_SUCK_IT_LAMBDA_${Random.nextInt}").toList
hejfelix commented 5 years ago

Im starting to believe the problem occurs because my web framework is not reacting to sigterm. Will investigate now and return.