Howto implement error handling?

andig commented 3 years ago

Once the CP has successfully started (error=nil) its not entirely clear how to track it's status or restart the client should it become disconnected?

UPDATE:

Here's an example where the CP closes when it's not happy with the CS message:

INFO[2020-11-02T17:16:31+01:00] ocpp OccurrenceConstraintViolation - Field Call.Payload.Value required but not found 
ERRO[2020-11-02T17:16:31+01:00] error while handling message: ocpp OccurrenceConstraintViolation - Field Call.Payload.Value required but not found 
ERRO[2020-11-02T17:16:31+01:00] error while reading from websocket: websocket: close 1011 (internal server error)

How would I handle that in my client code?

lorenzodonini commented 3 years ago

Right, so I haven't implemented automatic reconnect yet, but it's on the top of my ToDo list (along with completing v2.0.1).

I like the idea of having a seamless experience, where the dev doesn't have to worry about re-connecting or checking for connection network errors. Ideally you call:

err := chargePoint.Start(csUrl)

and get an err != nil only if the initial connection failed. Otherwise the function returns right away and your "main thread" is free to do what you want.

To answer your question (once the automatic reconnection is implemented) I see two options:

no error is thrown at all, because the connection will eventually be re-established (I would modify the error logs accordingly to be traces or warnings),
a callback is provided for at least notifying that there is currently no connection. This should only be informative though, without breaking the flow

Did I understand your point correctly? Or do you have other suggestions?

andig commented 3 years ago

I‘m wondering what to do in the current state. Having start return an error is fine, I can handle the retries. However- what happens with intermediate errors, e,g. Connection dropped at a later time? Does start return then? If yet it would probably more be a „run“ method than „start“?

IMHO start should indeed return immediately when the initial connection cannot be established as that might very well be a configuration error that should be handled. Later on an error channel or callback or similar could be an option (e.g. influxdb-client uses an channel).

andig commented 3 years ago

Btw, really appreciate your answers and the work you‘ve put into this library! I‘m trying to integrate it into https://github.com/andig/evcc, so the questions are coming from real-world use cases ;)

lorenzodonini commented 3 years ago

Thanks for checking this out then 😄 I'm very open to suggestions in general to make this a better library. I was also thinking of adding some hooks for metrics, logging, caching etc. later on, so now you've given me a valid reason.

lorenzodonini commented 3 years ago

Regarding your question: as I said there is no automatic reconnection right now, so you will have problems in real-world use-cases.

I'll work on adding that feature right away, so you can actually use the library as intended. Would you like a callback via channel like in the influxdb client?

andig commented 3 years ago

Regarding your question: as I said there is no automatic reconnection right now, so you will have problems in real-world use-cases.

Understood and I don't have any problem with that as long as Start will return when such an error happens. I couldn't figure out if that's the case. If the error is swallowed internally and Start continues running without caller being able to restart, that is indeed a problem.

I'll work on adding that feature right away, so you can actually use the library as intended. Would you like a callback via channel like in the influxdb client?

The influx case is targeted at logging, everything else is retry inside the influx client. We need a clean way not only to notify of error but also to return control to the caller. I could imagine something like an internal error channel that Start opens and then waits for any message and close when such comes- but I don't understand the architecture enough to validate it.

lorenzodonini commented 3 years ago

I will document the architecture at a later stage, but to put it very simply, the library has 3 layers:

websocket
ocpp-j
the actual ocpp message handling

Errors may originate on all three layers, but I would still group them in 2 main categories:

message-related errors (validation etc.), which are ALWAYS forwarded to the caller (either synchronously or asynchronously, depending on the chosen API flavor).
network errors -> These ones are swallowed by the lib right now. I can make a channel available for notifying the application about such errors, but in most cases the app wouldn't have to worry about those (because auto-reconnection would take care of it).

On Charge Points, the Start method returns an error ONLY if the initial connection fails. All further network errors are swallowed, but as mentioned above I can make events over channel happen easily.

Side note: There is one more possible error, which is MessageTypeNotSupported, caused by an endpoint sending a message which is not supported by the other endpoint. The caller will receive the error, but the endpoint throwing this error should IMHO not notify the application, because the application wouldn't care.

andig commented 3 years ago

On Charge Points, the Start method returns an error ONLY if the initial connection fails. All further network errors are swallowed, but as mentioned above I can make events over channel happen easily.

Ok, that's required. If Start does not return, we'll also need an exposed Reconnect method or- as you've planned- an internal recovery mechanism.

At the current status I could also Stop the CP which should cause Start to return? Anyway we'd need a way to get out of that Start once the network is dead.

lorenzodonini commented 3 years ago

I guess there was a misunderstanding, so let me clarify:

If initial connection succeeds, Start returns right away with a nil error. The internal read/write mechanism is run a in a dedicated goroutine. The calling thread is free to do what it wants. Further network errors (disconnections) are indeed swallowed and I would use an internal recovery mechanism to solve those.
If initial connection fails, Start returns with an error. No internal routines were started, so it's up to the application to call Start again. As you mentioned, typically this is due to config issues.

lorenzodonini commented 3 years ago

This is the described behavior: https://github.com/lorenzodonini/ocpp-go/blob/master/ws/websocket.go#L576

andig commented 3 years ago

Thanks for the clarification, especially about the Start behaviour. I've just found https://github.com/maurodelazeri/gorilla-reconnect which might help.

I think we can close this issue in favour of implementing a reconnect logic then?

lorenzodonini commented 3 years ago

Sure, I'll work on the auto-reconnect logic!

lorenzodonini / ocpp-go

Howto implement error handling? #38