lorenzodonini / ocpp-go

Open Charge Point Protocol implementation in Go
MIT License
273 stars 126 forks source link

ocpp16.ChargePointConnectionHandler unreliable or racy #292

Open andig opened 2 months ago

andig commented 2 months ago

OCPP version: [x] 1.6 [ ] 2.0.1

I'm submitting a ...

[x] bug report [ ] feature request

Current behavior: In https://github.com/evcc-io/evcc/issues/15890 we're seeing that ocpp16.ChargePointConnectionHandler does not seem to always get triggered when a charge point reconnects.

Expected behavior: Connect handler is always triggered after disconnect if a charge point re-establishes connection.

Instead, the order of disconnect and connect events seems arbitrary. Especially, there is no connect event after the last disconnect although the charge point is clearly still connected.

Notice that the connected/disconnected events are not paired in the log. If I should guess, it feels as if inflight sent messages might still trigger a disconnect even after the charge point has already reconnected?

Steps to reproduce: n/a

Related code:

The handler definition is quite straight forward:

// NewChargePoint implements ocpp16.ChargePointConnectionHandler
func (cs *CS) NewChargePoint(chargePoint ocpp16.ChargePointConnection) {
    cs.mu.Lock()
    defer cs.mu.Unlock()

    // check for configured charge point
    cp, ok := cs.cps[chargePoint.ID()]
    if ok {
        cs.log.DEBUG.Printf("charge point connected: %s", chargePoint.ID())
        cp.connect(true)
    }
}

// ChargePointDisconnected implements ocpp16.ChargePointConnectionHandler
func (cs *CS) ChargePointDisconnected(chargePoint ocpp16.ChargePointConnection) {
    cs.log.DEBUG.Printf("charge point disconnected: %s", chargePoint.ID())

    if cp, err := cs.ChargepointByID(chargePoint.ID()); err == nil {
        cp.connect(false)
    }
}

Other information:

Notice that after 2024/09/04 03:14:26 change point is still sending and we can send, but there is no connect handler triggered:

[ocpp  ] TRACE 2024/09/04 03:07:45 recv SN10052307203612: [2,"19429","DataTransfer",{"vendorId":"EN+","messageId":"cpChargingParameterRpt","data":"{\"duty\":0,\"cpVolt\":931,\"transactionId\":2,\"totalPowerFactor\":256,\"aPhasesPowerFactor\":0,\"bPhasesPowerFactor\":6,\"cPhasesPowerFactor\":0,\"ccVolt\":256,\"offset0\":2010,\"offset1\":2019,\"leakcurr\":0,\"AMBTemp\":83,\"lockStatus\":1,\"ctCurrent\":9}"}]
[ocpp  ] TRACE 2024/09/04 03:08:12 send SN10052307203612: [2,"1515171661","TriggerMessage",{"requestedMessage":"MeterValues","connectorId":1}]
[ocpp  ] TRACE 2024/09/04 03:08:20 send SN10052307203612: [3,"19429",{"status":"Accepted"}]
[ocpp  ] TRACE 2024/09/04 03:08:17 recv SN10052307203612: [2,"19430","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:07:53Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Sample.Periodic"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Sample.Periodic"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Sample.Periodic"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Sample.Periodic"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:08:20 recv SN10052307203612: [2,"19431","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:08:03Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Sample.Periodic"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Sample.Periodic"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Sample.Periodic"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Sample.Periodic"}]}],"transactionId":2}]
[ocpp  ] DEBUG 2024/09/04 03:08:42 charge point disconnected: SN10052307203612
[ocpp  ] ERROR 2024/09/04 03:08:43 error replying cp SN10052307203612 to request 19430: ocpp message (19430): GenericError - couldn't write to websocket. No socket with id SN10052307203612 is open
[ocpp  ] TRACE 2024/09/04 03:08:44 recv SN10052307203612: [2,"19432","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:08:14Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Sample.Periodic"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Sample.Periodic"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Sample.Periodic"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Sample.Periodic"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:08:44 recv SN10052307203612: [2,"19433","Heartbeat",{}]
[ocpp  ] ERROR 2024/09/04 03:08:44 error replying cp SN10052307203612 to request 19431: ocpp message (19431): GenericError - couldn't write to websocket. No socket with id SN10052307203612 is open
[ocpp  ] ERROR 2024/09/04 03:08:45 error replying cp SN10052307203612 to request 19432: ocpp message (19432): GenericError - couldn't write to websocket. No socket with id SN10052307203612 is open
[ocpp  ] ERROR 2024/09/04 03:08:45 error replying cp SN10052307203612 to request 19433: ocpp message (19433): GenericError - couldn't write to websocket. No socket with id SN10052307203612 is open
[ocpp  ] DEBUG 2024/09/04 03:12:35 charge point connected: SN10052307203612
[ocpp  ] TRACE 2024/09/04 03:12:35 send SN10052307203612: [2,"2027923241","TriggerMessage",{"requestedMessage":"MeterValues","connectorId":1}]
[ocpp  ] DEBUG 2024/09/04 03:14:25 charge point connected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point connected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point connected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point connected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point disconnected: SN10052307203612
[ocpp  ] TRACE 2024/09/04 03:14:26 send SN10052307203612: [2,"650736667","TriggerMessage",{"requestedMessage":"MeterValues","connectorId":1}]
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point disconnected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point disconnected: SN10052307203612
[ocpp  ] DEBUG 2024/09/04 03:14:26 charge point disconnected: SN10052307203612
[ocpp  ] TRACE 2024/09/04 03:14:26 recv SN10052307203612: [2,"19471","DataTransfer",{"vendorId":"EN+","messageId":"gatewayInfo","data":"{\"SN\":\"SN10052307203612\",\"fwVer\":\"V43.2.636\",\"gateCode\":107,\"measureType\":1,\"disconnectReason\":4,\"maxCurr\":320,\"phase\":1,\"sdCardStatus\":0}"}]
[ocpp  ] TRACE 2024/09/04 03:14:26 send SN10052307203612: [3,"19471",{"status":"Accepted"}]
[ocpp  ] TRACE 2024/09/04 03:14:27 recv SN10052307203612: [2,"offline-19470","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:14:25Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Sample.Periodic"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Sample.Periodic"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Sample.Periodic"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Sample.Periodic"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:14:28 recv SN10052307203612: [3,"650736667",{"status":"Accepted"}]
[ocpp  ] TRACE 2024/09/04 03:14:28 recv SN10052307203612: [2,"19472","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:14:26Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Trigger"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Trigger"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Trigger"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Trigger"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:14:28 send SN10052307203612: [3,"19472",{}]
[ocpp  ] ERROR 2024/09/04 03:14:28 no handler available for call of type TriggerMessage from client SN10052307203612 for request 650736667
[ocpp  ] TRACE 2024/09/04 03:14:28 send SN10052307203612: [2,"3288924852","TriggerMessage",{"requestedMessage":"MeterValues","connectorId":1}]
[ocpp  ] TRACE 2024/09/04 03:14:28 send SN10052307203612: [3,"offline-19470",{}]
[ocpp  ] TRACE 2024/09/04 03:14:29 recv SN10052307203612: [3,"3288924852",{"status":"Accepted"}]
[ocpp  ] ERROR 2024/09/04 03:14:29 no handler available for call of type TriggerMessage from client SN10052307203612 for request 3288924852
[ocpp  ] TRACE 2024/09/04 03:14:29 recv SN10052307203612: [2,"19473","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:14:28Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Trigger"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.0","context":"Trigger"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Trigger"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Trigger"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:14:30 send SN10052307203612: [3,"19473",{}]
[ocpp  ] TRACE 2024/09/04 03:14:31 recv SN10052307203612: [2,"19474","DataTransfer",{"vendorId":"EN+","messageId":"cpChargingParameterRpt","data":"{\"duty\":0,\"cpVolt\":931,\"transactionId\":2,\"totalPowerFactor\":256,\"aPhasesPowerFactor\":0,\"bPhasesPowerFactor\":6,\"cPhasesPowerFactor\":0,\"ccVolt\":256,\"offset0\":2010,\"offset1\":2019,\"leakcurr\":0,\"AMBTemp\":82,\"lockStatus\":1,\"ctCurrent\":9}"}]
[ocpp  ] TRACE 2024/09/04 03:14:31 send SN10052307203612: [3,"19474",{"status":"Accepted"}]
[ocpp  ] TRACE 2024/09/04 03:14:37 recv SN10052307203612: [2,"19475","MeterValues",{"connectorId":1,"meterValue":[{"timestamp":"2024-09-03T17:14:34Z","sampledValue":[{"measurand":"Current.Offered","unit":"A","value":"32.0","context":"Sample.Periodic"},{"measurand":"Current.Import","unit":"A","phase":"L1","value":"0.0"},{"measurand":"Voltage","unit":"V","phase":"L1","value":"247.3","context":"Sample.Periodic"},{"measurand":"Energy.Active.Import.Register","unit":"Wh","value":"18099.0","context":"Sample.Periodic"},{"measurand":"Power.Active.Import","unit":"W","value":"0.0","context":"Sample.Periodic"}]}],"transactionId":2}]
[ocpp  ] TRACE 2024/09/04 03:14:37 recv SN10052307203612: [2,"19476","Heartbeat",{}]
[ocpp  ] TRACE 2024/09/04 03:14:37 send SN10052307203612: [3,"19475",{}]
lorenzodonini commented 2 months ago

The disconnect handler is invoked in one of the following cases:

In all these cases no new connection will be accepted, until the old one was properly cleaned up. It is theoretically possible that your scheduler takes away CPU during execution in ChargePointDisconnected and handles the new connection instead, leading to charge point connected being logged before the disconnection. But this seems statistically unlikely, especially if we're talking about remote connections.

Do you have logs enabled in the websocket layer? I would expect to see this printed closed connection to %s" before your ChargePointDisconnected method is invoked. If you could show me some full logs (debug logs also welcome) I might be able to pinpoint the issue.

jeffborg commented 2 months ago

@lorenzodonini I think that is exactly what happened that what is theoretically possible happened, it would have been a bit cpu constrained while all this was going on at 3am. Is there a way to enable the websocket layer logs for ocpp go library from inside evcc?

andig commented 2 months ago

Thats an evcc topic, not a library topic.

andig commented 2 months ago

all these cases no new connection will be accepted, until the old one was properly cleaned up. It is theoretically possible that your scheduler takes away CPU during execution in ChargePointDisconnected and handles the new connection instead, leading to charge point connected being logged before the disconnection.

I don‘t see how this should happen if ocpp-go serialises that?

lorenzodonini commented 1 month ago

I don‘t see how this should happen if ocpp-go serialises that?

I was referring to the callback. The operation order is:

  1. cleanup existing connection (map deletion)
  2. unlock mutex
  3. invoke callback -> this triggers your log in evcc

Any new connection is accepted on a dedicarted goroutine (that's how the http mux works), which at that point may run concurrently to your callback. So in theory CPU scheduler can cause some confusion, but again I do not believe this to be realistic.

Is there a way to enable the websocket layer logs for ocpp go library from inside evcc?

Yes, I suggest:

websocket.SetLogger(yourLogger) // Possibly with debug enabled (it will be spammy though)

See if this happens again and post the full logs here. If there really is a bug I didn't see yet, logs should make it more obvious.

andig commented 1 month ago

So in theory CPU scheduler can cause some confusion, but again I do not believe this to be realistic.

Unless the callback is run in a go routine I don't see either how that should happen. It's still a bit problematic if we could have racy client-side callback code that cannot clearly be serialised. An alternative might be an "online status" api for the CS?