nats-io / nats.py

Python3 client for NATS
https://nats-io.github.io/nats.py/
Apache License 2.0
885 stars 188 forks source link

Library reconnects don't work #66

Closed pvanderlinden closed 6 years ago

pvanderlinden commented 6 years ago

Originally posted here: https://github.com/nats-io/asyncio-nats-streaming/issues/7 I can not get reconnects working, I have tested it with a program publishing and consuming, if I shut down the server for a few seconds, they both will stop working (the publisher will timeout on publishing, the consumer will just not receive any messages anymore)

Connection code:

from functools import partial

async def test_cb(cb_type, *args, **kwargs):
    print(cb_type, args, kwargs)

nc = NATS()
await nc.connect(io_loop=loop, error_cb=partial(test_cb, 'error'), disconnected_cb=partial(test_cb, 'disconnect'), closed_cb=partial(test_cb, 'closed'), reconnected_cb=partial(test_cb, 'reconnect'), ping_interval=25)

Output:

error (NatsError('nats: empty response from server when expecting INFO message',),) {}
reconnect () {}
error (<class 'nats.aio.errors.ErrStaleConnection'>,) {}
disconnect () {}
error (NatsError('nats: empty response from server when expecting INFO message',),) {}
error (ConnectionRefusedError(111, "Connect call failed ('127.0.0.1', 4222)"),) {}
disconnect () {}
closed () {}

What I understand from the defaults it should try to reconnect 10 times with a delay of 2 seconds, which means if the server is down for less then 20 seconds it should at least reconnect and resume operation, unfortunately it won't resume operation (the consumer will just not receive anything anymore, the producer will timeout on a publish call, even when the server is back up again).

pvanderlinden commented 6 years ago

I'm looking into fixing this, as it blocks us from using NATS. My main question is: what is the logic supposed to do, as I currently see several strange things?:

I'm mainly asking, because I'm assuming the logic is supposed to be similar across the different clients.

wallyqs commented 6 years ago

Thanks for your patience on this issue. I revised the reconnection logic and there were a few places where it was not working in the same way as in the Go client, now have a branch here where behavior should be improved if want to take a look: https://github.com/nats-io/asyncio-nats/pull/67

More about how to the reconnection logic ought to work trying to cover some of the points above:

Thanks again for pinging on this, I will merge the branch soon and make a release containing the fixes.

wallyqs commented 6 years ago

@pvanderlinden behavior in master should be improved now, I will make a release this week including the fix.

pvanderlinden commented 6 years ago

Thanks, This seems to work all fine now (with the last commit on the master branch)!

wallyqs commented 6 years ago

@pvanderlinden Thanks for checking, have just released this in the v0.7.2 version of the client.