shoenig / go-mqtt

A development fork of the Eclipse Paho Go MQTT client
http://git.eclipse.org/c/paho/org.eclipse.paho.mqtt.golang.git/
Eclipse Public License 1.0
13 stars 6 forks source link

Error handling #15

Closed robert17 closed 10 years ago

robert17 commented 10 years ago

Some thoughts and questions about shutting down and error handling for the client..

Current shutdown procedure:

1. Set c.alive = false

  -> keepalive should stop based on this?

2. Push true to c.stopch

  -> alllogic selects from c.stopch and returns

3. Push true to c.stopch

  -> incoming won't process this yet as it is sitting in Conn.Read

4. Write disconnect message and close socket

  -> causes incoming to get an error

    -> incoming selects from c.stopch and returns

5. Close persistence

Error shutdown procedure (if it is a connection related error, anyways):

1. Error occurs in incoming or outgoing

  -> if in incoming, break from read loop and push to c.errors
     if we aren't currently disconnecting.

  -> if in outgoing, push to c.errors

2. alllogic gets error

  -> if onConnectionLost is set, call it and return from alllogic

  -> if onConnectionLost not set, panic...

Some questions:

Does persistence need to be closed somewhere in the error case? Right now, the persistence is created in NewClient and opened in Start.

What happens to ping in error case?

If the error occurs in incoming, what happens to outgoing? If the error occurs in outgoing, what happens to incoming? One solution could be to call conn.Close() when we get an error. Incoming, outgoing and alllogic should all end. Ping may hang in this case.

If onConnectionLost is not set, should we have a way to pass errors to the application, instead of just panic?

We should also think about proper places to close channels in proper shutdown or error shutdown scenarios. In the case of ping, if outgoing got an error and closed then ping could hang trying to put the message onto oboundP.

shoenig commented 10 years ago

I would like to get rid of c.alive if we can. It's a race condition waiting to happen, and I don't think it solves a problem that can't be better solved through channels.

In a connection error case, I would expect the persistence to remain intact, such that the client can reconnect and resume after a connection failure. Right now, calling Unopen() on the filestore leaves the files there, but in memstore the store gets deleted. The behaviors needs to be changed to match. I think we should add a Reset() method to the Store interface, and have the functionality to wipe the store in there. So yes, I think persistence should be closed in the error case, but that should not include wiping the contents.

So basically our problem is, either incoming or outgoing could be the first to detect a connection error (although, it seems the go runtime doesn't detect disconnects while reading from a connection!), and we need to gracefully shutdown all the running goroutines - incoming, outgoing, alllogic, and pinger. The ideal design pattern would be to just close channels on the sender end, and detect a nil coming out of the channel on the receiver, in which case you would return from that goroutine. However, this is much easier said than done - The main problem being we will never receive a nil in a goroutine that is blocked trying to put a message on a channel that is not receiving.

Perhaps there is a default function we can provide for onConnectionLost? Like print an error message to stderr? And then let the client die gracefully.

shoenig commented 10 years ago

At this point, what we need is a comprehensive SVT setup with which we can manipulate and monitor the broker, and ensure the client is behaving correctly.