Open gscteam opened 4 years ago
I don't know if there is a parameter for connection refresh... Perhaps there's a way to just query the database again and get fresh object information?
I have PR for this, workaround is resetting the connection inside the python pooling The server/client socket should be able to get the latest information without resetting the connection. So it still seems this should be addressed server side.
I tried to write a test around the broken behavior of session.data_objects.get , but as far as I can tell it always returns a data object d for which d.replicas shows a correct reflection of the existing replicas.
@gscteam , could I get a guide on how to reproduce the issue?
@d-w-moore can you reproduce it on the command line by using irepl out-of-band from the python calls? aka, start session, get replicas... run irepl elsewhere... get replicas again.
To reproduce : 1) get a tiering storage in place tier0 and tier1 is enough 2) iput a file in tier0 cmdline, python do a d= data_objects.get() and list replicas 3) let the file migrate to tier1 in the same python without redoing the session, do a get again and it should show the tier0 replica instead of the tier1. This works in restaging also.
@trel Yes, I did try out of band irepl, itrim thing and python always gave the correct replica list (once ses.data_objects.get() was called to "refresh" the object). The issue regards the return value of that get( ), so I'm just not understanding why the PR is needed. Tiering shouldn't be any different, either; it's doing a vanilla replication between resources, the only difference is it's happening in policy.
It happens for just put a new object in irods also, i put B2 in a compound resource by using iput -K -R , and the python did error complaining the object did not exist, until I restarted it. A note, we are using mariadb as DBbackend, just in case you are using postgres. I am pretty sure this is not a DB issue, as a mysql connected to it report the change immediatly:
In [7]: d =session.data_objects.get('/scZone/home/user/BARN.zip')
In [8]: d.replicas
Out[8]:
[
If we're getting a DataObjectDoesNotExist when there is one clearly there, I'd suggest there may be more to this syndrome than a need to refresh the connection. I will try to reproduce this early next week. @gscteam - could i get the output of ilsresc -l
from your irods system?
Hi Here is the resources the scicomp ones were for the tiering.
resource name: cacheResc
id: 10033
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /irods/cacheResc
free space:
free space time: : Never
status:
info:
comment:
create time: 01579101099: 2020-01-15.09:11:39
modify time: 01579101126: 2020-01-15.09:12:06
context:
parent: 10032
parent context: cache
----
resource name: demoResc
id: 10014
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /var/lib/irods/Vault
free space:
free space time: : Never
status:
info:
comment:
create time: 01579032805: 2020-01-14.14:13:25
modify time: 01579032805: 2020-01-14.14:13:25
context:
parent:
parent context:
----
resource name: irodscmpd1
id: 10032
zone: scZone
type: compound
class: cache
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01579101002: 2020-01-15.09:10:02
modify time: 01579101002: 2020-01-15.09:10:02
context:
parent:
parent context:
----
resource name: s3Resc
id: 10036
zone: scZone
type: s3
class: cache
location: scicat1
vault: /irodscmpd1
free space:
free space time: : Never
status:
info:
comment:
create time: 01579103850: 2020-01-15.09:57:30
modify time: 01579103895: 2020-01-15.09:58:15
context: S3_DEFAULT_HOSTNAME=scs3.genusplc.com;S3_AUTH_FILE=/etc/irods/s3/p100_irods.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent
parent: 10032
parent context: archive
----
resource name: scicomp_pfs
id: 10060
zone: scZone
type: unixfilesystem
class: cache
location: scicat1
vault: /mnt/irods_cache/scicomp
free space:
free space time: : Never
status:
info:
comment:
create time: 01579543973: 2020-01-20.12:12:53
modify time: 01579543973: 2020-01-20.12:12:53
context:
parent:
parent context:
----
resource name: scicomp_s3
id: 10062
zone: scZone
type: s3
class: cache
location: scicat1
vault: /scicomp
free space:
free space time: : Never
status:
info:
comment:
create time: 01579543973: 2020-01-20.12:12:53
modify time: 01579543973: 2020-01-20.12:12:53
context: S3_DEFAULT_HOSTNAME=scs3.genusplc.com;S3_AUTH_FILE=/etc/irods/s3/p100_irods.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached
parent:
parent context:
I am closing this, we went down to more testing, and it appears to be related to caching in odbc driver.
@gscteam Glad you fixed the issue... Is there anything else you can share about your debugging the scenario? What finally led to figuring this out?
Was using the our fixed prc version, saw that after doing multiple reconfig and testing...so issue still exists, with any odbc. we will do more testing to see if it is db related or irods core related.
@trel , So we got to the bottom of this (we think). we tried basic setup, 2 unixfilesystem resc:
This is a very simple setup, easy to verify, the problem might be in the mysql database plugin.
question now is can this be seen/demonstrated with a client separate from python? java? c++?
Hi @trel,
I am running into a connection caching issue, where doing a data_objects.get() always return the same results even though the object has a new replica (verified with ils-L). if the session is recreated or the cleanup is called the correct updated answer is returned.
It does not appear to be a python issue, it seems this is a behavior of the irods server ? Is there a way to change this behavior by a parameter ? if there is I could provide a PR for it.