jmccrohan / pysolarmanv5

A python module to interact with Solarman Data Logging Sticks
MIT License
126 stars 27 forks source link

Socket not closing on upgrade from v2.4.0 #39

Closed connesha closed 1 year ago

connesha commented 1 year ago

v2.4.0 did not have a disconnect method (it was added in v2.5.0)

Consider 2 separate clients are using pysolarmanv5 to access the same datastick. Client_1 runs in a loop, polling a few registers every few seconds. Client_2 does some infrequent reads/writes. Both clients were previously on v2.4.0 and working well. On upgrading Client_2 to v2.5.0, it can be seen that the socket is not getting closed, as the debug logs show the following output continuously until Client_2 is terminated: DEBUG:pysolarmanv5.pysolarmanv5:[POLL-DISCARDED] RECD: a5 1f 00 10 15 00 2e 8f 0a ... DEBUG:pysolarmanv5.pysolarmanv5:[POLL-DISCARDED] RECD: a5 17 00 10 15 00 2f 8f 0a ...

Simplified Client_2:

modbus = PySolarmanV5( "192.168.1.24", 123456789, port=8899, mb_slave_id=1, verbose=True ) print(modbus.read_input_registers(register_addr=33022, quantity=1)) sleep(100)

Calling disconnect() in Client_2 appears to resolve the issue, however this is a compatibility issue as this method did not exist in previous versions and now appears to be required. I haven't looked to see if its the frequent traffic from Client_1 that is keeping the Client_2 socket alive.

Also consider, disconnect() is not described in the docs or called in the sample

thank you

githubDante commented 1 year ago

I don't see an issue here. The socket is not automatically closed in version 2.4.0 either. The data from other clients was buffered until some of the read/write methods were called (see #19). The debug output shows that unwanted by Client_2 data is discarded i.e. Client_2 will receive exactly the data which was requested by it.

The connection will be closed if the client is terminated even if the disconnect method is not called.

jmccrohan commented 1 year ago

Hi @connesha,

Thanks for the bug report. It does appear based on the DEBUG logs provided that the v2.5.0 client is correctly discarding the spurious responses which were intended for the v2.4.0 polling client.

I would agree that this appears be a regression between v2.4.0 and v2.5.0 because of the fact that you had to amend your script. The issue is of course the fact that the data loggers appear to send a reply for any client to all open connections. I hadn't quite considered your use case when doing my own pre-release testing.

I'm not sure it warrants a version bump to v3.0.0, but I have added the disconnect() method to the examples in e872326, and will add a note in the changelog prior to release.

jmccrohan commented 1 year ago

Hi @connesha,

I am summarising the details of our email discussion here for posterity;

Recreate the issue using the example script below under the following four scenarios:

  1. As-is with v2.4.0
  2. As-is with v2.5.0 (disconnect() commented out)
  3. Uncomment disconnect(), with v2.5.0
  4. Comment out disconnect() again, add auto_reconnect=True to constructor, with v2.5.0

1 and 3 should succeed. Others will fail, catching the exception and re-instantiating did not work.

from time import sleep
from pysolarmanv5 import PySolarmanV5

def main():

    for i in range(10):

        modbus = PySolarmanV5("192.168.1.100", 1234567890, verbose=True)
        retry_count = 0
        while True:
            try:
                val = modbus.read_input_registers(register_addr=33022, quantity=1)
                print(f"index {i} : {val}")
            except Exception as e:
                print(f"Exception on index {i}: {e}")
                if retry_count == 3:
                    raise
                else:
                    print(f"index {i} : retry {retry_count}")
                    retry_count += 1
                    sleep(1)
                    continue
            break

        #modbus.disconnect()
        sleep(2)

if __name__ == "__main__":
    main()

Thanks for this test case. I was able to recreate your results. Prior to #33, the socket was implicitly closed when the pysolarmanv5 object was dereferenced. This meant that one could just re-instantiate another instance and it worked. Following #33, the socket is still referenced in the worker thread, with the result that when re-instantiated, the old object (and socket) remain.

It would appear that the Solarman data loggers permit six(?) simultaneous connections, so the above example failed consistently for me on the same iteration.

Depending on the behavior of the calling application, this is a breaking pysolarmanv5 change for those upgrading from v2.4.0 to v2.5.0. I am currently of the view to withdraw v2.5.0 and re-release as v3.0.0, highlighting the new disconnect() method.

githubDante commented 1 year ago

OK I'm wrong in my previous comment. The socket is automagically closed when the object is dereferenced before #33 and the same behavior hardly can be achieved with the background thread. A possible solution is the thread to monitor the ref counters of the class from which it was spawned and when the references reach a value below the initial one (the value of sys.getrefcount(self) when the thread was created) the thread to exit. It's a bit hacky and I'm not sure what the consequences might be.

jmccrohan commented 1 year ago

I've just released v3.0.0. Thanks both for your help!