openfaas / nats-queue-worker

Queue-worker for OpenFaaS with NATS Streaming
https://docs.openfaas.com/reference/async/
MIT License
129 stars 59 forks source link

Implemented reconnection logic in queue-worker #52

Closed bartsmykla closed 5 years ago

bartsmykla commented 5 years ago

It's the second part of: https://github.com/openfaas/nats-queue-worker/pull/49 which should resolve https://github.com/openfaas/faas/issues/1031 and #33

Signed-off-by: Bart Smykla bsmykla@vmware.com

Description

Implemented reconnection logic in queue-worker

Motivation and Context

How Has This Been Tested?

I have testd it by deploying it to my local cluster with and without persistence store in nats (mysql) and I tried to simulate disconnections from nats server by killing the containers with it, and trying to async invoke functions.

Logs after I scaled nats server to zero, waited some time (few seconds), and scaled it back again:

func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Disconnected from nats://nats:4222
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Connecting to: nats://nats:4222
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Reconnection (1/5) to nats://nats:4222 failed
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Waiting 2s before next try
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Connecting to: nats://nats:4222
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Reconnection (2/5) to nats://nats:4222 failed
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Waiting 4s before next try
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Connecting to: nats://nats:4222
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Subscribing to: faas-request at nats://nats:4222
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Wait for  5m0s
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Listening on [faas-request], clientID=[faas-worker-453efcc20e04], qgroup=[faas] durable=[]
func_queue-worker.1.ve68xbsffrfq@linuxkit-025000000001    | Reconnection (3/5) to nats://nats:4222 succeeded

Types of changes

Checklist:

bartsmykla commented 5 years ago

@kozlovic I would be very pleased if you could look at that one too as you did yesterday with https://github.com/openfaas/nats-queue-worker/pull/49

bartsmykla commented 5 years ago

Thanks @kozlovic. I'll work on that tomorrow!

bartsmykla commented 5 years ago

@kozlovic I think I have addressed all things from your comment. Would you be so kind, and do me PR once again? :-) Thanks for your time!

alexellis commented 5 years ago

I guess Reconnection should be Reconnecting.. ?

alexellis commented 5 years ago

Hi @bartsmykla thanks for editing the commit. Do you know why the build failed?

alexellis commented 5 years ago
# github.com/openfaas/nats-queue-worker
./main.go:312:5: undefined: "github.com/openfaas/nats-queue-worker/nats".Init
FAIL    github.com/openfaas/nats-queue-worker [build failed]
=== RUN   Test_GetClientID_ContainsHostname
--- PASS: Test_GetClientID_ContainsHostname (0.00s)
=== RUN   TestCreategetClientID
--- PASS: TestCreategetClientID (0.00s)
=== RUN   TestCreategetClientIDWhenHostHasUnsupportedCharacters
--- PASS: TestCreategetClientIDWhenHostHasUnsupportedCharacters (0.00s)
PASS
ok      github.com/openfaas/nats-queue-worker/handler   0.004s
=== RUN   TestGetClientID
--- PASS: TestGetClientID (0.00s)
=== RUN   TestGetClientIDWhenHostHasUnsupportedCharacters
--- PASS: TestGetClientIDWhenHostHasUnsupportedCharacters (0.00s)
PASS
ok      github.com/openfaas/nats-queue-worker/nats  0.002s
The command '/bin/sh -c go test -v ./...' returned a non-zero code: 2
The command "./build.sh" exited with 2.

Please see above.

alexellis commented 5 years ago

I'm not sure why this line is in the PR, but it doesn't appear to work?

go nats.Init("http://" + config.NatsAddress + ":8222")

bartsmykla commented 5 years ago

@alexellis It shouldn't be here. When doing rebasing I have commited by mistake a line from nats-streaming prometheus exporter testing. :-/

alexellis commented 5 years ago

OK. I'll remove it and fix another bug I found in the code via #54