irods / python-irodsclient

A Python API for iRODS
Other
62 stars 73 forks source link

connection caching (just mysql?) #193

Open gscteam opened 4 years ago

gscteam commented 4 years ago

Hi @trel,

I am running into a connection caching issue, where doing a data_objects.get() always return the same results even though the object has a new replica (verified with ils-L). if the session is recreated or the cleanup is called the correct updated answer is returned.

It does not appear to be a python issue, it seems this is a behavior of the irods server ? Is there a way to change this behavior by a parameter ? if there is I could provide a PR for it.

trel commented 4 years ago

I don't know if there is a parameter for connection refresh... Perhaps there's a way to just query the database again and get fresh object information?

gscteam commented 4 years ago

I have PR for this, workaround is resetting the connection inside the python pooling The server/client socket should be able to get the latest information without resetting the connection. So it still seems this should be addressed server side.

d-w-moore commented 4 years ago

I tried to write a test around the broken behavior of session.data_objects.get , but as far as I can tell it always returns a data object d for which d.replicas shows a correct reflection of the existing replicas.

@gscteam , could I get a guide on how to reproduce the issue?

trel commented 4 years ago

@d-w-moore can you reproduce it on the command line by using irepl out-of-band from the python calls? aka, start session, get replicas... run irepl elsewhere... get replicas again.

gscteam commented 4 years ago

To reproduce : 1) get a tiering storage in place tier0 and tier1 is enough 2) iput a file in tier0 cmdline, python do a d= data_objects.get() and list replicas 3) let the file migrate to tier1 in the same python without redoing the session, do a get again and it should show the tier0 replica instead of the tier1. This works in restaging also.

d-w-moore commented 4 years ago

@trel Yes, I did try out of band irepl, itrim thing and python always gave the correct replica list (once ses.data_objects.get() was called to "refresh" the object). The issue regards the return value of that get( ), so I'm just not understanding why the PR is needed. Tiering shouldn't be any different, either; it's doing a vanilla replication between resources, the only difference is it's happening in policy.

gscteam commented 4 years ago

It happens for just put a new object in irods also, i put B2 in a compound resource by using iput -K -R , and the python did error complaining the object did not exist, until I restarted it. A note, we are using mariadb as DBbackend, just in case you are using postgres. I am pretty sure this is not a DB issue, as a mysql connected to it report the change immediatly:

In [7]: d =session.data_objects.get('/scZone/home/user/BARN.zip')

In [8]: d.replicas
Out[8]: [,

] In [9]: d1 =session.data_objects.get('/scZone/home/user/B2.zip') DataObjectDoesNotExist Traceback (most recent call last) in ----> 1 d1 =session.data_objects.get('/scZone/home/user/B2.zip') /usr/lib/python3.8/site-packages/irods/manager/data_object_manager.py in get(self, path, file, **options) 56 results = query.all() # get up to max_rows replicas 57 if len(results) <= 0: ---> 58 raise ex.DataObjectDoesNotExist() 59 return iRODSDataObject(self, parent, results) 60 DataObjectDoesNotExist: In [10]: exit Python 3.8.1 (default, Jan 8 2020, 23:09:20) Type 'copyright', 'credits' or 'license' for more information IPython 7.11.1 -- An enhanced Interactive Python. Type '?' for help. IPython profile: irods In [1]: d1 =session.data_objects.get('/scZone/home/user/B2.zip') In [2]: d1.replicas Out[2]: [, ]`
d-w-moore commented 4 years ago

If we're getting a DataObjectDoesNotExist when there is one clearly there, I'd suggest there may be more to this syndrome than a need to refresh the connection. I will try to reproduce this early next week. @gscteam - could i get the output of ilsresc -l from your irods system?

gscteam commented 4 years ago

Hi Here is the resources the scicomp ones were for the tiering.

resource name: cacheResc
id: 10033
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /irods/cacheResc
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579101099: 2020-01-15.09:11:39
modify time: 01579101126: 2020-01-15.09:12:06
context: 
parent: 10032
parent context: cache
----
resource name: demoResc
id: 10014
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /var/lib/irods/Vault
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579032805: 2020-01-14.14:13:25
modify time: 01579032805: 2020-01-14.14:13:25
context: 
parent: 
parent context: 
----
resource name: irodscmpd1
id: 10032
zone: scZone
type: compound
class: cache
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579101002: 2020-01-15.09:10:02
modify time: 01579101002: 2020-01-15.09:10:02
context: 
parent: 
parent context: 
----
resource name: s3Resc
id: 10036
zone: scZone
type: s3
class: cache
location: scicat1
vault: /irodscmpd1
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579103850: 2020-01-15.09:57:30
modify time: 01579103895: 2020-01-15.09:58:15
context: S3_DEFAULT_HOSTNAME=scs3.genusplc.com;S3_AUTH_FILE=/etc/irods/s3/p100_irods.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent
parent: 10032
parent context: archive
----
resource name: scicomp_pfs
id: 10060
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /mnt/irods_cache/scicomp
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579543973: 2020-01-20.12:12:53
modify time: 01579543973: 2020-01-20.12:12:53
context: 
parent: 
parent context: 
----
resource name: scicomp_s3
id: 10062
zone: scZone
type: s3
class: cache
location: scicat1
vault: /scicomp
free space: 
free space time: : Never
status: 
info: 
comment: 
create time: 01579543973: 2020-01-20.12:12:53
modify time: 01579543973: 2020-01-20.12:12:53
context: S3_DEFAULT_HOSTNAME=scs3.genusplc.com;S3_AUTH_FILE=/etc/irods/s3/p100_irods.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached
parent: 
parent context: 
gscteam commented 4 years ago

I am closing this, we went down to more testing, and it appears to be related to caching in odbc driver.

trel commented 4 years ago

@gscteam Glad you fixed the issue... Is there anything else you can share about your debugging the scenario? What finally led to figuring this out?

gscteam commented 4 years ago

Was using the our fixed prc version, saw that after doing multiple reconfig and testing...so issue still exists, with any odbc. we will do more testing to see if it is db related or irods core related.

gscteam commented 4 years ago

@trel , So we got to the bottom of this (we think). we tried basic setup, 2 unixfilesystem resc:

This is a very simple setup, easy to verify, the problem might be in the mysql database plugin.

trel commented 4 years ago

question now is can this be seen/demonstrated with a client separate from python? java? c++?