irods / python-irodsclient

A Python API for iRODS
Other
63 stars 73 forks source link

irods.connection pool management (active vs idle) can go haywire #562

Closed d-w-moore closed 2 months ago

d-w-moore commented 4 months ago

consider open_and_close_lots_of_data_objects_serially.py:

import logging
logger = logging.getLogger()

import irods.test.helpers

s = (irods.test.helpers).make_session()

def open_n_objects(n):
  try:
    for m in range(1,n+1):
     with s.data_objects.open('/tempZone/home/rods/{:03d}'.format(m),'w') as f:
       f.write(b'blah')
       pass
     logger.warning('# active = %d, # idle = %d',len(s.pool.active),len(s.pool.idle))
  finally:
    print(''' ******************\n ''' 'm =',m,
          '''\n ******************''')

open_n_objects(n = 120)

Ideally this opens data objects in serial fashion, writing a few bytes to, and then closing, each handle f implicitly (at the end of the with-block); and in so doing should also leave the data object's associated server connection (an irods.connection object) in a reusable disposition by moving it into the session's idle pool . Normal output is as follows:

user@7888352e30fd:~$ 
logout
root@7888352e30fd:/tmp# 
exit
(py3-2) daniel@precision-5530:~/prc.ghrunner$ irm -fr `iquest %s "select DATA_NAME where COLL_NAME = '/tempZone/home/rods'"`
(py3-2) daniel@precision-5530:~/prc.ghrunner$ ils
/tempZone/home/rods:
(py3-2) daniel@precision-5530:~/prc.ghrunner$ python ~/open_and_close_lots_of_data_objects_serially.py 
WARNING:root:# active = 0, # idle = 1
WARNING:root:# active = 0, # idle = 1 
[...MANY repeats of the identical line here, so ... editing out for brevity...]
WARNING:root:# active = 0, # idle = 1
WARNING:root:# active = 0, # idle = 1
 ******************
 m = 120 
 ******************

However, in commit d44a47ac026773daa880636caf425e546281a9f0, presently part of https://github.com/d-w-moore/python-irodsclient/tree/ghrunner-nohealthcheck-562.m-FREEZE this is the output we see (This is with the script above running in the python client container , after run_tests.sh has setup user's client environment) :

user@7888352e30fd:~$ python /tmp/open_and_close_lots_of_data_objects_serially.py 
WARNING:root:# active = 1, # idle = 0
WARNING:root:# active = 2, # idle = 0
WARNING:root:# active = 3, # idle = 0
WARNING:root:# active = 4, # idle = 0
WARNING:root:# active = 5, # idle = 0
WARNING:root:# active = 6, # idle = 0
WARNING:root:# active = 7, # idle = 0
WARNING:root:# active = 8, # idle = 0
WARNING:root:# active = 9, # idle = 0
WARNING:root:# active = 10, # idle = 0
WARNING:root:# active = 11, # idle = 0
WARNING:root:# active = 12, # idle = 0
WARNING:root:# active = 13, # idle = 0
WARNING:root:# active = 14, # idle = 0
WARNING:root:# active = 15, # idle = 0
WARNING:root:# active = 16, # idle = 0
WARNING:root:# active = 17, # idle = 0
WARNING:root:# active = 18, # idle = 0
WARNING:root:# active = 19, # idle = 0
WARNING:root:# active = 20, # idle = 0
WARNING:root:# active = 21, # idle = 0
WARNING:root:# active = 22, # idle = 0
WARNING:root:# active = 23, # idle = 0
WARNING:root:# active = 24, # idle = 0
WARNING:root:# active = 25, # idle = 0
WARNING:root:# active = 26, # idle = 0
WARNING:root:# active = 27, # idle = 0
WARNING:root:# active = 28, # idle = 0
WARNING:root:# active = 29, # idle = 0
WARNING:root:# active = 30, # idle = 0
WARNING:root:# active = 31, # idle = 0
WARNING:root:# active = 32, # idle = 0
WARNING:root:# active = 33, # idle = 0
WARNING:root:# active = 34, # idle = 0
WARNING:root:# active = 35, # idle = 0
WARNING:root:# active = 36, # idle = 0
WARNING:root:# active = 37, # idle = 0
WARNING:root:# active = 38, # idle = 0
WARNING:root:# active = 39, # idle = 0
WARNING:root:# active = 40, # idle = 0
WARNING:root:# active = 41, # idle = 0
WARNING:root:# active = 42, # idle = 0
WARNING:root:# active = 43, # idle = 0
WARNING:root:# active = 44, # idle = 0
WARNING:root:# active = 45, # idle = 0
WARNING:root:# active = 46, # idle = 0
WARNING:root:# active = 47, # idle = 0
WARNING:root:# active = 48, # idle = 0
WARNING:root:# active = 49, # idle = 0
WARNING:root:# active = 50, # idle = 0
WARNING:root:# active = 51, # idle = 0
WARNING:root:# active = 52, # idle = 0
WARNING:root:# active = 53, # idle = 0
WARNING:root:# active = 54, # idle = 0
WARNING:root:# active = 55, # idle = 0
WARNING:root:# active = 56, # idle = 0
WARNING:root:# active = 57, # idle = 0
WARNING:root:# active = 58, # idle = 0
WARNING:root:# active = 59, # idle = 0
WARNING:root:# active = 60, # idle = 0
WARNING:root:# active = 61, # idle = 0
WARNING:root:# active = 62, # idle = 0
WARNING:root:# active = 63, # idle = 0
WARNING:root:# active = 64, # idle = 0
WARNING:root:# active = 65, # idle = 0
WARNING:root:# active = 66, # idle = 0
WARNING:root:# active = 67, # idle = 0
WARNING:root:# active = 68, # idle = 0
WARNING:root:# active = 69, # idle = 0
WARNING:root:# active = 70, # idle = 0
WARNING:root:# active = 71, # idle = 0
WARNING:root:# active = 72, # idle = 0
WARNING:root:# active = 73, # idle = 0
WARNING:root:# active = 74, # idle = 0
WARNING:root:# active = 75, # idle = 0
WARNING:root:# active = 76, # idle = 0
WARNING:root:# active = 77, # idle = 0
WARNING:root:# active = 78, # idle = 0
WARNING:root:# active = 79, # idle = 0
WARNING:root:# active = 80, # idle = 0
WARNING:root:# active = 81, # idle = 0
WARNING:root:# active = 82, # idle = 0
 ******************
 m = 83 
 ******************
Traceback (most recent call last):
  File "/repo/irods/pool.py", line 63, in get_connection
    conn = self.idle.pop()
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/open_and_close_lots_of_data_objects_serially.py", line 20, in <module>
    open_n_objects(n = 120)
  File "/tmp/open_and_close_lots_of_data_objects_serially.py", line 11, in open_n_objects
    with s.data_objects.open('/tempZone/home/rods/{:03d}'.format(m),'w') as f:
  File "/repo/irods/manager/data_object_manager.py", line 398, in open
    conn = self.sess.pool.get_connection()
  File "/repo/irods/pool.py", line 17, in method_
    ret = method(self,*s,**kw)
  File "/repo/irods/pool.py", line 79, in get_connection
    conn = Connection(self, self.account)
  File "/repo/irods/connection.py", line 75, in __init__
    auth_module.login(self)
  File "/repo/irods/auth/native.py", line 3, in login
    conn._login_native()
  File "/repo/irods/connection.py", line 615, in _login_native
    self.recv()
  File "/repo/irods/connection.py", line 151, in recv
    raise get_exception_by_code(msg.int_info, err_msg)
  File "/repo/irods/exception.py", line 171, in get_exception_by_code
    exc_class = iRODSExceptionMeta.codes[ rounded_code( code ) ]
KeyError: -167000
user@7888352e30fd:~$ 

In both cases above, I'm running with essentially PRC v2.0.0 and a 4.3.1 iRODS server on separate nodes. (In the latter case, where the issue and errors are seen, it's a docker-compose network connecting the two nodes, ie containers).

PRC should , according to a reasonable understanding, be running the same exact instructions, But in the error case, it seems connections are not finding their way back to the idle pool as they should.. They stay in the active category, and do not get re-used for data object "n+1" once data object "n" has been written finalized (ie closed).

d-w-moore commented 3 months ago

Minimal script to reproduce (on single node running both server & client). confirmed to reproduce issue on main branch commit 71d787fe1f79d81775d892c59f3e9a9f60262c78 (v2.0.1, latest current release as of now):

# before running:
#  iadmin mkresc a1 unixfilesystem  localhost:/tmp/a1
import irods.keywords as kw
import irods.test.helpers
s = (irods.test.helpers).make_session()
print ('BEFORE -- active = ',len(s.pool.active),'; idle = ',len(s.pool.idle))
with s.data_objects.open('/tempZone/home/rods/abc.dat', 'w',#allow_redirect = False,
                         **{kw.DEST_RESC_NAME_KW: 'a1'}) as f:
  pass
print ('AFTER -- active = ',len(s.pool.active),'; idle = ',len(s.pool.idle))

compare output with and without uncommenting extra allow_redirect = False parameter in open()

alanking commented 2 months ago

@d-w-moore - Please close if https://github.com/irods/python-irodsclient/pull/563 resolves this issue. Thanks!