lorenzodonini / ocpp-go

Open Charge Point Protocol implementation in Go
MIT License
262 stars 125 forks source link

Howto implement error handling? #38

Closed andig closed 3 years ago

andig commented 3 years ago

Once the CP has successfully started (error=nil) its not entirely clear how to track it's status or restart the client should it become disconnected?

UPDATE:

Here's an example where the CP closes when it's not happy with the CS message:

INFO[2020-11-02T17:16:31+01:00] ocpp OccurrenceConstraintViolation - Field Call.Payload.Value required but not found 
ERRO[2020-11-02T17:16:31+01:00] error while handling message: ocpp OccurrenceConstraintViolation - Field Call.Payload.Value required but not found 
ERRO[2020-11-02T17:16:31+01:00] error while reading from websocket: websocket: close 1011 (internal server error) 

How would I handle that in my client code?

lorenzodonini commented 3 years ago

Right, so I haven't implemented automatic reconnect yet, but it's on the top of my ToDo list (along with completing v2.0.1).

I like the idea of having a seamless experience, where the dev doesn't have to worry about re-connecting or checking for connection network errors. Ideally you call:

err := chargePoint.Start(csUrl)

and get an err != nil only if the initial connection failed. Otherwise the function returns right away and your "main thread" is free to do what you want.

To answer your question (once the automatic reconnection is implemented) I see two options:

Did I understand your point correctly? Or do you have other suggestions?

andig commented 3 years ago

I‘m wondering what to do in the current state. Having start return an error is fine, I can handle the retries. However- what happens with intermediate errors, e,g. Connection dropped at a later time? Does start return then? If yet it would probably more be a „run“ method than „start“?

IMHO start should indeed return immediately when the initial connection cannot be established as that might very well be a configuration error that should be handled. Later on an error channel or callback or similar could be an option (e.g. influxdb-client uses an channel).

andig commented 3 years ago

Btw, really appreciate your answers and the work you‘ve put into this library! I‘m trying to integrate it into https://github.com/andig/evcc, so the questions are coming from real-world use cases ;)

lorenzodonini commented 3 years ago

Thanks for checking this out then 😄 I'm very open to suggestions in general to make this a better library. I was also thinking of adding some hooks for metrics, logging, caching etc. later on, so now you've given me a valid reason.

lorenzodonini commented 3 years ago

Regarding your question: as I said there is no automatic reconnection right now, so you will have problems in real-world use-cases.

I'll work on adding that feature right away, so you can actually use the library as intended. Would you like a callback via channel like in the influxdb client?

andig commented 3 years ago

Regarding your question: as I said there is no automatic reconnection right now, so you will have problems in real-world use-cases.

Understood and I don't have any problem with that as long as Start will return when such an error happens. I couldn't figure out if that's the case. If the error is swallowed internally and Start continues running without caller being able to restart, that is indeed a problem.

I'll work on adding that feature right away, so you can actually use the library as intended. Would you like a callback via channel like in the influxdb client?

The influx case is targeted at logging, everything else is retry inside the influx client. We need a clean way not only to notify of error but also to return control to the caller. I could imagine something like an internal error channel that Start opens and then waits for any message and close when such comes- but I don't understand the architecture enough to validate it.

lorenzodonini commented 3 years ago

I will document the architecture at a later stage, but to put it very simply, the library has 3 layers:

  1. websocket
  2. ocpp-j
  3. the actual ocpp message handling

Errors may originate on all three layers, but I would still group them in 2 main categories:

On Charge Points, the Start method returns an error ONLY if the initial connection fails. All further network errors are swallowed, but as mentioned above I can make events over channel happen easily.

Side note: There is one more possible error, which is MessageTypeNotSupported, caused by an endpoint sending a message which is not supported by the other endpoint. The caller will receive the error, but the endpoint throwing this error should IMHO not notify the application, because the application wouldn't care.

andig commented 3 years ago

On Charge Points, the Start method returns an error ONLY if the initial connection fails. All further network errors are swallowed, but as mentioned above I can make events over channel happen easily.

Ok, that's required. If Start does not return, we'll also need an exposed Reconnect method or- as you've planned- an internal recovery mechanism.

At the current status I could also Stop the CP which should cause Start to return? Anyway we'd need a way to get out of that Start once the network is dead.

lorenzodonini commented 3 years ago

I guess there was a misunderstanding, so let me clarify:

lorenzodonini commented 3 years ago

This is the described behavior: https://github.com/lorenzodonini/ocpp-go/blob/master/ws/websocket.go#L576

andig commented 3 years ago

Thanks for the clarification, especially about the Start behaviour. I've just found https://github.com/maurodelazeri/gorilla-reconnect which might help.

I think we can close this issue in favour of implementing a reconnect logic then?

lorenzodonini commented 3 years ago

Sure, I'll work on the auto-reconnect logic!