johanmeijer / grott

Growatt inverter monitor
https://github.com/johanmeijer/grott/wiki
395 stars 109 forks source link

Frequently seems blocked until restart #418

Open FreezyExp opened 1 year ago

FreezyExp commented 1 year ago

Frequently grott seems stuck, usually at the start of the day when the unit just 'woke up' it will appear to be stuck.

This is using the grottserver.py in a docker to report to home-assistant.

What can I do to debug the cause of the appearant halted state? Only resetting the docker seems to get it operational again. Would checking for firmware updates help at all?

If the system is working it's working just fine, so I am hesitant to try updating the firmwares if there is no need to do so.

These are the last messages it posts in the logs:

     - Grottserver - 0104 data record received

     - Grottserver - Response: 

        \x00\x83\x00\x06\x00\x03\x01\x04\x47\x95\x51

     - Grottserver - Put response on queue:  172.27.0.1_48810  msg: 

         \x00\x83\x00\x06\x00\x03\x01\x04\x47\x95\x51

     - Grottserver - get response from queue:  172.27.0.1_48810 msg: 

         \x00\x83\x00\x06\x00\x03\x01\x04\x47\x95\x51

     - Growatt packet received:

          <socket.socket fd=17, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.27.0.2', 5279), raddr=('192.168.148.10', 1035)>

     - Data less then minimum record length, data not processed
DanielAtkinson commented 1 year ago

How long is the update time on the shine WiFi device?

Have you enabled data buffering in the config?

johanmeijer commented 1 year ago

My experience is that the inverter after startup not always generates Output directly. It has jus enough power to startup (and ado some communication with the growatt servers) but not really delivers output to the grid. Especially in the autumn/winter the startup can take a while.

Can you see what the inverter state is? If it is connecting in with a counter then it tries to sync with the grid, but that will not always happens (and the connect cycles starts again). It even might happen that it had already delivered but then stops again (because it is to dark) and starts again later.

I do not know if it is also true for the newer inverters but my older inverters makes a couple of loud "clicks" (relais sounds) when it connects to the grid.

FreezyExp commented 1 year ago

This is using a shinelan-x adapter, the light is always on even if the unit is sleeping (either blue or green). Next time it gets stuck and I am home, I will check the light on the adapter before restarting the docker environment.

To clarify, only the docker containing the python scripts needs to be restarted for it to start working again.

My current suspicions are:

I am currently checking using automation, 30 minutes after sunrise to see if the unit is operational. As the location of the panels start producing around sunrise quite consistently. This means that when I get a push notification on my phone, I restart the docker remotely.

This works around the issue for now, with only about a 45-60 minute loss of data-precision (instead of days before I notice).

FreezyExp commented 1 year ago

Ok so when it is stuck the shinelan-x is operational and responsive, so the issue must lie within the docker / python side of things.

Perhaps the socket should be re-created after a certain period?

I've set a 2 minute data polling time, which has reduced the likelyhood of blocking, but it would seem that it currently does not handle the shinelan just switching off, possibly mid stream.

FreezyExp commented 1 year ago

Going through the code, this would be the closest point that deals with sockets that sends the last message. So I slapped in a timeout into the ss call.

In grottproxy.py :

def main(self,conf):
        self.input_list.append(self.server)
        while 1:
            time.sleep(delay)
            ss = select.select
            # 300 : timeout in seconds, since the default is 5 minutes, we can expect some data every 5 minutes, if not, retry later.
            inputready, outputready, exceptready = ss(self.input_list, [], [], 300)
            for self.s in inputready:
# ...
johanmeijer commented 1 year ago

Ok. Nederland seen this problemen behoren, i also see you are using grottserver. What did I ment with this?

Which versions of grott/grottserver are you running?

FreezyExp commented 12 months ago

The 0.8 version that i also pull requested #385 The fix i applied did not seem to have worked, i will check if I can find something else to try.

I will also attempt to add a goodnight and goodmorning log message, so that it is clear in the logs. Perhaps when a socket timeout occurs (15 minutes) it logs a "are you awake yet" message when opening a new socket.

I don't know if this makes sense, but i suspect the socket being the issue.

FreezyExp commented 12 months ago

ah it seems I was quite behind on the branch, I will update and report back

FreezyExp commented 12 months ago

I have updated my branch and added the good night and good morning logs, even with handling the exception a bit it still causes the blocking behaviour.

After a goodnight it stays disconnected.

https://github.com/johanmeijer/grott/compare/Beta-(2.8.x)...FreezyExp:grott:Beta-(2.8.x)

Why would it stall out? Perhaps the socket cannot re-open for some reason?

FreezyExp commented 12 months ago

I have a hunch now, if the socket is unresponsive the ss is never closed, so it restarts fine a couple of times and then it eventually gets stuck.

I will add socket closing when there is no inputdata

johanmeijer commented 11 months ago

Ok. I will look at it. It seems strange to me that you are to only one (at least I heard of) with this problem.

FreezyExp commented 11 months ago

I'm running docker on ubuntu with the proxy & server, without any internet connection. There might not be a lot of people running it like that.

And my specific inverter or the shinelan-x connector might be adding to the problem. The firmware(s) might be outdated, adding to this problem.

My current hunch is that select.select requires some resources to be released in order for proper connection reset to occur. I am going to try another fix to the proxy socket to see if that remains stable.

FreezyExp commented 11 months ago

something like

            # 300 : timeout in seconds, since the default is 5 minutes, we can expect some data every 5 minutes, if not, retry later.
            inputready, outputready, exceptready = select.select(self.input_list, [], [], 300)

# 300 : timeout in seconds, since the default is 5 minutes, we can expect some data every 5 minutes, if not, retry later.
            inputready, outputready, exceptready = select.select(self.input_list, [], [], 300)

            # Handle the socket exceptions
            for self.s in exceptready             
                try:
                    print("\t - A Socket Exception was raised, closing the connection."),
                    self.on_close(conf)
                except socket.error as e:
                    print(f"\t - An error occurred while closing Socket: {e}")                    

          for self.s in inputready:
          ...
FreezyExp commented 11 months ago

I think I am starting to understand the code a bit now.

In main, we add self.server to the input list the self.server is never inspected for errors and I see no handling of an error state Should that socket error out, the connection would need to be reset / recreated / handled somehow.

The input_list also contains the other sockets which are established as the proxy gathers data it tries to connect to the server. Those sockets are inspected for health and re-connection.

So my code above would terminate the server socket and then never re-establish it, that would leave it dangling forever. The same would happen if the server socket were to for some reason error out, it would continue to raise unhandled exceptions never getting back into the inputready list.

Going off of that, I will add a check to see if the socket with the exception is the server socket and if so close and re-establish it.

FreezyExp commented 11 months ago

here is my next attempt https://github.com/johanmeijer/grott/commit/14b75873dbc4d35c8b5441c9f25a982cd224e009

johanmeijer commented 11 months ago

What bothers me a little bit is that grott is running ok for several years and probably with hundreds (thousands) instances and I never have heard about this problem.

I am not saying there can not be an error or omission in the code. I am certain there are bugs. But I expected more people with this problem.

Is there a way I can (re)create and test this problem?

FreezyExp commented 11 months ago

I have no clue what I am doing incorrectly. The weird part is that it instantly recovers when I restart the docker.

I can add the docker-compose later as well as some more specifics.

Sadly the checks I added did not seem to catch the issue. It says goodnight and never wakes up again.

anoppe commented 11 months ago

The docker-compose.yml might be interesting to see indeed.

I'm curious if it might have something to do with the available resources (CPU/men/disk) on your system. Could you share the output of this command: docker stats? This will print some statics for all running containers every second or so. It might provide some useful insights into the system load. Also when using containers you're limited by the amount of available resources on the (physical) system.

FreezyExp commented 11 months ago

The system load is a non-issue (cpu runs at about 10%, memory 20%), it might be the old network card I added to the server. Before that card I used an equally cruddy usb ethernet adapter, but it had the same issue. So either both are poor quality / bad driver, or something else is amiss.

I am running about 15 docker containers at the same time, but ports are seperated and Grott is the only one using the second network interface.

FreezyExp commented 11 months ago
version: '3.3'
services:
    grott:
        #network_mode: host
        build:
            context: ./dockerimage
            dockerfile: ./dockerfile
        image: ledidobe/grott
        container_name: grott
        restart: always
        ports:
            - "192.168.2.1:5279:5279"
            - "5781:5781"
            - "5782:5782"
            - "51:5279"
        volumes:
            - ./grott:/app/
            #- /opt/grott/grottstub.py:/app/grottext.py
        environment:
            - gblockcmd=Flase
            # mqtt
            - gnomqtt=True
            - gmqttip=localhost
            - gmqttauth=False
            - gmqttuser=growatt
            - gmqttpassword=growatt2020
            - gpvoutput=False
            # influx
            - ginflux=False
            - ginflux2=False
            - gifdbname=grottdb
            - gifip=localhost
            - gifport=8086
            - gifuser=grott
            - gifpassword=growatt2020
            - giftoken=influx_token
            - giforg=grottorg
            - gifbucket=grottdb
            # extensions
            - gextension=true
            - gextname=grott_ha
            - 'gextvar={"ha_mqtt_host": "192.168.2.1", "ha_mqtt_port": "1883", "ha_mqtt_user": "username", "ha_mqtt_password": "password"}'
            # timezone
            - TZ=Europe/Amsterdam
            # server
            - ggrowattip=192.168.2.1
            - ggrowattport=5781

./grott:/app/

contains https://github.com/FreezyExp/grott/tree/Beta-(2.8.x)

as well as grott_ha.py from the examples

FreezyExp commented 11 months ago

Not sure why I had port 51:5279, I must have forgotten to remove it while testing some stuff a while ago.

I will remove it, maybe it is the cause of my issue XD

FreezyExp commented 11 months ago

Well that was not it either XD Perhaps I shall just stop the docker and restart it an hour before sunrise. It does not resolve the cause, but at least it works around the issue.

johanmeijer commented 11 months ago

Strange it seems that your docker container falls asleep or something like that.....

Can you try to contact the grottserver browser page (:5872)? To see if it is still awake? Before the restart?

What type of inverter/datalogger do you have? Does it perform a cold start in the morning (like my old -s series, with a shinewifi) or does it stay connected and up the whole day?

FreezyExp commented 11 months ago

The inverter seems to turn off completely during the night. I have now automated the docker to close during the night, as this solves the problem.

ShineLan-X: V1.0 version 3.6.0.2

From the grott logs: Possible Inverter serial RCL2CK70AT

It seems that neither grott or the 'Growatt ShineLan-X Setting Center' mention a firmware for the inverter itself.

From the device interface: Model: S3EB08D00 T00P0FU01M002E Firmware: AK1.0

There is definetely a cold-start, where there is also an audible relay activating or deactivating. The inverter becomes unresponsive at night.

As stated before. The shinelan-x does come back online and can be reached from my server. The light is green at night and blue during the day (not 100% sure about all the possible states).

mvdkleijn commented 8 months ago

Just to chime in here, I also seem to be having this problem. At the very least it is very similar. I was actually considering doing a stop-start at midnight since the problem seems to occur only after running for 2-3 days.

My symptoms:

Running Grott 2.8.2 on Raspberry Pi 4 with Ubuntu 64-bit OS in a Docker Compose setup.