lorenzodonini / ocpp-go

Open Charge Point Protocol implementation in Go
MIT License
262 stars 125 forks source link

RemoteTriggers / GetConfiguration and ChangeConfiguration stop working after invoking the charging station after a while #66

Closed utsavanand2 closed 2 years ago

utsavanand2 commented 3 years ago

Hi @lorenzodonini , a big thank you to all of the contributors for building this library, I've been using your library and a forked version of it (specifically to get local times instead of UTC) for about a month now.

The Problem

I've been seeing an issue with remote triggers since the beginning where the charging station remains connected to my CSMS, sends regular StatusNotification and Heartbeats but after a while I am not able to trigger further TriggerMessages on the charging station. The only thing that solves the issue is if I restart my CSMS and the charging station connects back to it again, then I am able to start sending those TriggerMessages again with a response from the charging station.

Screenshot 2021-04-20 at 11 11 00 PM

I would highly appreciate if you can point me to a possible fix or file where I can look into and even possibly send a PR to fix this issue.

OCPP Version

1.6

lorenzodonini commented 3 years ago

Hey, I will need a few more info:

  1. Are you using the same lib both for CSMS and Charging Station?
  2. By cannot trigger you mean the function call hangs, or it never reaches the other endpoint?
    • if it does indeed reach the other endpoint and you are sure the charging station replies, I can investigate whether the reply might've gotten stuck somewhere
  3. Have you tried sending other messages apart from TriggerMessage?
    • this is just to double-check whether something is hanging
utsavanand2 commented 3 years ago

Hii @lorenzodonini. Thanks for such a quick response.

As for your queries:

  1. The CSMS is using this ocpp-go library, but I'm not aware of the the library used by the Charging Station, but I can be sure that its definitely not ocpp-go.
  2. The function call just hangs, so as a temporary fix I've added deadline context cancellation so it doesn't hang forever. Though I continue to receive other message types - Heartbeat, StatusNotification from the charging station.
  3. I have tried GetConfiguration and ChangeConfiguration calls and they hang as well in this state.

The charging station starts behaving normally and these issues are gone when I restart my CSMS and the Charging Station connects again.

utsavanand2 commented 3 years ago

Hi @lorenzodonini I hope you're doing great. I was wondering when can we get to see the next stable release of the latest changes that have been made after the last release.

lorenzodonini commented 3 years ago

Hey and sorry for the late reply. I've been trying to reproduce your issue, but no luck.

The RemoteTrigger works as any other message (the underlying dispatching mechanism doesn't care which message is being sent), so I would expect the library eventually hang regardless of the used message. Although I have seen some occasional race conditions (some of them have been fixed and some other and being fixed), I haven't been able to see the specific behavior you are describing.

I'm planning on releasing a new version in the next few days. It will contain some "breaking changes", meaning that you will have to update a couple fields in the callbacks. All functionality remains unchanged though, and no methods were removed.

ahmadfarisfs commented 3 years ago

Yeah, i'm having the same thing but its on RemoteStartTransaction (this function call sometimes stuck) and my workaround is to restart. But it's very hard to reproduce :(

Edit: maybe related to this PR ?

lorenzodonini commented 3 years ago

Yeah, i'm having the same thing but its on RemoteStartTransaction (this function call sometimes stuck) and my workaround is to restart. But it's very hard to reproduce :(

Edit: maybe related to this PR ?

I've merged that PR now. It covers only the case, in which the client disconnected using the Stop method, while you were sending a request (regardless of which type of request). Only in that case, the calling goroutine would hang.

Please keep in mind that the SendRequest method is synchronous blocking and will block, until a response or an error was received, so if the connection broke for other reasons, the client websocket will attempt to re-connect and send that request. As long as this request didn't explicitly succeed or fail (timeout counts as failure), the caller will be blocked.

If you need fully asynchronous behaviour, please refer to the SendRequestAsync api instead.

utsavanand2 commented 3 years ago

Hi @lorenzodonini ! Thanks for the improvements around the library and the new release! I was keen on knowing if the new release would resolve this issue, but we're getting the same behaviour from our charging station, especially when they are connected with Wifi (with around 90% error rate) and with ethernet its a lot better (around 5-10% error rate). So I used the library and created a charging station emulator locally running on my local machine, and connected it with our service that runs this library for the CSMS server. I responded to GetConfiguration/ChangeConfiguration requests 50% of the time intentionally and to my surprise we're not able to send out more requests once a single request fails, until the connection is reset. I've noted your recommendation to use SendRequestAsync to go around this issue, correct me if I'm wrong but aren't we using SendRequestAsync for GetConfiguration/ChangeConfiguration anyway?

Doing something nasty like this on the charging station to simulate a lost response? This results into something where no further requests from the charging station are processed.

func (handler *ChargePointHandler) OnGetConfiguration(request *core.GetConfigurationRequest) (confirmation *core.GetConfigurationConfirmation, err error) {
    var resultKeys []core.ConfigurationKey
    var unknownKeys []string
    for _, key := range request.Key {
        configKey, ok := handler.configuration[key]
        if !ok {
            unknownKeys = append(unknownKeys, *configKey.Value)
        } else {
            resultKeys = append(resultKeys, configKey)
        }
    }
    if len(request.Key) == 0 {
        // Return config for all keys
        for _, v := range handler.configuration {
            resultKeys = append(resultKeys, v)
        }
    }
    logDefault(request.GetFeatureName()).Infof("returning configuration for requested keys: %v", request.Key)
    conf := core.NewGetConfigurationConfirmation(resultKeys)
    conf.UnknownKey = unknownKeys
    log.Infof("counter val: %v", handler.counter)
    if handler.counter%2 != 0 {
        handler.counter += 1
        return conf, nil
    } else {
        handler.counter += 1
        return nil, nil
    }
}
lorenzodonini commented 3 years ago

Thanks @utsavanand2 for posting some more details and example code.

The issue you're facing is actually the central system not sending you any more requests, because it never receives any reply from the charge point to the GetConfiguration request. When you are returning both a nil confirmation and a nil error in the example callback, you are in fact not sending a reply to the central system. Good job spotting this, but please never return two nil values for any handler function, as no endpoint should behave like that.

Back to the issue, I can fix this with a response timeout on the central system itself. After said timeout, the central system will assume the message was lost, pop it from the queue and dispatch the next one. This should cover cases, where the charge point has poor connection and doesn't manage to get a response through the network in a "reasonable amount of time".

What is still confusing me, is why you get this error frequently in your systems, since you mentioned using WiFi and Ethernet. I would expect timeouts on cellular network, but not with your setup. Can you provide more info about your setup?

utsavanand2 commented 3 years ago

Thanks a lot @lorenzodonini ! Infact this is what I was suspecting as well, that we should never return both confirmation and the error as nil. Unfortunately I don't have any access to the firmware that our manufacturer produces, that is why I connected a raspberry pi and ran the charge point library to reproduce the issue. We have already raised an issue with the manufacturer and they are looking into why they are not able to send out messages on Wifi after 20-30 requests from the CSMS. But I would also really like to harden our approach on the ocpp library on the server side, so we don't have any questions or doubts on our side. I would be really happy to create a PR and help you out with this! Thanks again!

utsavanand2 commented 2 years ago

Closing this issue in favor of enhancements in the latest release :)