lorenzodonini / ocpp-go

Open Charge Point Protocol implementation in Go
MIT License
258 stars 120 forks source link

CS confuses error responses between requests in case of timeout #294

Open andig opened 4 days ago

andig commented 4 days ago

OCPP version: [x] 1.6 [ ] 2.0.1

I'm submitting a ...

[x] bug report [ ] feature request

Current behavior: At https://github.com/evcc-io/evcc/issues/15991 we're experiencing an issue where a message timeout on one message is returned for a different message.

Expected behavior: When CP does not reply to a CS request, a following (or concurrent) CS request (that the CP replies to) should not receive an error result.

Consider this log:

enqueued CALL [3655400180, ChangeAvailability] for test-4
dispatched request 3655400180 for test-4
send test-4: [2,"3655400180","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
started timeout timer for test-4
recv server: [2,"3655400180","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
handling incoming CALL [3655400180, ChangeAvailability]
enqueued CALL [484949078, GetConfiguration] for test-4

Steps to reproduce:

Related code:

// send availability request and log error if cp doesn't reply
    if err := Instance().ChangeAvailabilityRequest(cp.ID(), 0, core.AvailabilityTypeOperative); err != nil {
        cp.log.DEBUG.Printf("failed configuring availability: %v", err)
    }

// send get configuration request - this finally errors with timeout which it MUST NOT
    resp, err := cp.GetConfiguration()
    if err != nil {
        return err
    }

Other information:

andig commented 4 days ago

Here's another log from testing. All timeouts are default except for dispatcher timeout set to 5s for quicker test results:

2024-09-14 12:42:04 enqueued CALL [2784858784, ChangeAvailability] for test-4
2024-09-14 12:42:04 dispatched request 2784858784 for test-4
2024-09-14 12:42:04 sent JSON message to test-4: [2,"2784858784","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 12:42:04 started timeout timer for test-4
2024-09-14 12:42:04 received JSON message from server: [2,"2784858784","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 12:42:04 handling incoming CALL [2784858784, ChangeAvailability]
2024-09-14 12:42:09 timeout for client test-4, canceling message
2024-09-14 12:42:09 completed request 2784858784 for test-4
2024-09-14 12:42:09 request 2784858784 for test-4 timed out
2024-09-14 12:42:09 test-4 ready to transmit again
2024-09-14 12:42:09 enqueued CALL [3284185250, GetConfiguration] for test-4
2024-09-14 12:42:09 dispatched request 3284185250 for test-4
2024-09-14 12:42:09 sent JSON message to test-4: [2,"3284185250","GetConfiguration",{}]
2024-09-14 12:42:09 started timeout timer for test-4
2024-09-14 12:42:14 timeout for client test-4, canceling message
2024-09-14 12:42:14 completed request 3284185250 for test-4
2024-09-14 12:42:14 request 3284185250 for test-4 timed out
2024-09-14 12:42:14 test-4 ready to transmit again

The client timing out is simulated by letting handler wait in a delay:

func (handler *ChargePointHandler) OnChangeAvailability(request *core.ChangeAvailabilityRequest) (confirmation *core.ChangeAvailabilityConfirmation, err error) {
    time.Sleep(time.Minute)
    return core.NewChangeAvailabilityConfirmation(core.AvailabilityStatusAccepted), nil
}

At this point I'm no longer sure if what I'm seeing here is really the real issue or if this is an artefact from testing setup. Log shows that GetConfiguration times out. However, the actual OnGetConfiguration handler of the charge point is never called. At this point I'm no longer sure if this is proper simulation of the actual charge point or if this specific behaviour is due to requests to charge point are being serialised in the library implementation.

andig commented 4 days ago

If the behaviour here is due to the simulated CP during testing, it would be great to have a different/better way of simulating CP timeouts. For example, the CP implementation could allow returning a sentinel error that is then not relayed to the CS.

andig commented 4 days ago

@lorenzodonini if you would consider accepting a bounty for resolving this issue it would be great if you could reach out to info@evcc.io. Thank you!

andig commented 4 days ago

Indeed it seems that this is a testing artefact. I have changed the CP handleIncomingRequest to simply return when it receives a ChangeAvailability:

2024-09-14 13:47:57 enqueued CALL [1904156176, ChangeAvailability] for test-4
2024-09-14 13:47:57 dispatched request 1904156176 for test-4
2024-09-14 13:47:57 sent JSON message to test-4: [2,"1904156176","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 13:47:57 started timeout timer for test-4
2024-09-14 13:47:57 received JSON message from server: [2,"1904156176","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 13:47:57 handling incoming CALL [1904156176, ChangeAvailability]
2024-09-14 13:48:02 timeout for client test-4, canceling message
2024-09-14 13:48:02 completed request 1904156176 for test-4
2024-09-14 13:48:02 request 1904156176 for test-4 timed out
2024-09-14 13:48:02 test-4 ready to transmit again
2024-09-14 13:48:02 enqueued CALL [3405965410, GetConfiguration] for test-4
2024-09-14 13:48:02 dispatched request 3405965410 for test-4
2024-09-14 13:48:02 sent JSON message to test-4: [2,"3405965410","GetConfiguration",{}]
2024-09-14 13:48:02 started timeout timer for test-4
2024-09-14 13:48:02 received JSON message from server: [2,"3405965410","GetConfiguration",{}]
2024-09-14 13:48:02 handling incoming CALL [3405965410, GetConfiguration]
2024-09-14 13:48:02 sent CALL RESULT [3405965410]
2024-09-14 13:48:02 sent JSON message to server: [3,"3405965410",{"configurationKey":[{"key":"AuthorizationKey","readonly":false},{"key":"NumberOfConnectors","readonly":false,"value":"1"},{"key":"ChargeProfileMaxStackLevel","readonly":false,"value":"1"},{"key":"ChargingScheduleMaxPeriods","readonly":false,"value":"1"},{"key":"MaxChargingProfilesInstalled","readonly":false,"value":"1"},{"key":"ChargingScheduleAllowedChargingRateUnit","readonly":false,"value":"1"},{"key":"MeterValuesSampledData","readonly":false,"value":"Power.Active.Import    

It's now obvious that the test continues without blocking.

andig commented 4 days ago

That said: it would be great if we had a means of simulating missing request acceptance or missing reply from either CS or CP. It seems this is currently not possible?

sc-atompower commented 2 days ago

Not totally sure if this is related but we are starting to see panics due to on checked casting of ocpp resonses.

It looks like something can happen and any call can get another calls response and panic.

Examples:

interface conversion: ocpp.Response is *core.HeartbeatConfirmation, not *core.StartTransactionConfirmation

panic: interface conversion: ocpp.Response is *core.HeartbeatConfirmation, not *core.MeterValuesConfirmation

@lorenzodonini this may interest you

lorenzodonini commented 2 days ago

@sc-atompower did this start at a specific point in time? Are you on master branch or latest release?

It's tricky to reproduce but I'll look into it, since this sounds like a major issue.

sc-atompower commented 1 day ago

@lorenzodonini it started as soon as we went from

v0.16.1-0.20230417193019-3ecaf00b33b6

to

github.com/lorenzodonini/ocpp-go v0.18.0

and it just happened again.