Closed mstfdkmn closed 3 months ago
Every PRC session object manages its own pool of connections. Connections are reused to improve performance. Anytime a connection is reused, you'll see that same PID on the server side.
@mstfdkmn Does that answer your question?
Yes, thanks.
Hi @korydraughn, one more topic that I need your verification on. while downloading large files (size is bigger than 32MB), I am seeing pep_api_data_obj_open_post
is invoked as number of times as existing threads. In the download of a 100MB file, generally three threads exist and because of that pep_api_data_obj_open_post
is triggered three times. However regardless of how many times the object is opened or how many times pep_api_data_obj_open_post
is triggered pep_api_data_obj_close_post
is invoked only one time.
Is this related to your explanation above? That is, it is normal to see the "close pep" is triggered only one time because there is only one connection/process set but since the same connection is reused for threads to read the object we see "open pep" is triggered more than one.
Not quite.
If I want to download a data object in parallel, I'd use N
threads, each with their own iRODS connection. Each of those threads would call open, seek, read, close. So I'd expect the number of calls to open and close to be identical.
Perhaps pep_api_replica_close_post
is being invoked instead of pep_api_data_obj_close_post
for some of the close operations?
but since the same connection is reused for threads to read the object we see "open pep" is triggered more than one.
So you've identified that a single iRODS connection is being reused across the same download threads?
If that's true, then there must be a lock involved to synchronize access to the iRODS connection.
@d-w-moore What do you think?
Perhaps pep_api_replica_close_post is being invoked instead of pep_api_data_obj_close_post for some of the close operations?
Yes, I can verify that pep_api_replica_close_post
is triggered for the threads other than main thread that is part of the process. So, if N3
threads are used, N1
invokes pep_api_data_obj_open_post
and pep_api_data_obj_close_post
, and N2 and N3 invoke pep_api_data_obj_open_post
and pep_api_replica_close_post
. And N1 triggers pep_api_data_obj_close_post
as the last one. If I try to reflect my observation, that is;
N1 obj_open/obj_read/obj_read...
N2 obj_open/obj_read/obj_read.../replica_close
N3 obj_open/obj_read/obj_read.../replica_close
N1 obj_close
So you've identified that a single iRODS connection is being reused across the same download threads?
Actually that is how I understood from your reply above. -->Every PRC session object manages its own pool of connections. Connections are reused to improve performance
Ah, everything makes sense now.
The calls to replica_close
are to prevent the close operations from attempting to update the catalog. The normal original close operation will always attempt to update the catalog. This only matters when writing in parallel.
For reading, the use of replica_close
is unnecessary, but not incorrect by any means.
Btw, in order to document the difference between clients that use the same apis here I am sharing my findings that shows how gocommands (go-irodsclient) work:
pep_api_data_obj_open_post
and pep_api_data_obj_close_post
are invokes as number of times as thread numbers defined,pep_api_replica_close_post
is never invoked.Yep, that's fine. That will work as long as everything is coordinated correctly. However, choosing to NOT use the rc_replica_open
and rc_replica_close
APIs does mean some performance is lost because the iRODS server has to do more work for the rcDataObjClose
API calls and there's more network traffic.
python-irodsclient 2.0.1 irods 4.3.2 https://github.com/irods/irods_rule_engine_plugin_audit_amqp
I am not sure I am reporting on the correct repository (this might go to the irods or the audit plugin repo).
When I am interacting with irods through the PRC, in the audit plugin log I am observing the followings:
with iRODSSession(irods_env_file=env_file) as session: session.data_objects.create("/u0137480/home/mustafa/myfile_prc_new.txt") with iRODSSession(irods_env_file=env_file) as session: session.data_objects.put("test.txt", "/u0137480/home/mustafa/test.txt") with iRODSSession(irods_env_file=env_file) as session: session.data_objects.unlink("/u0137480/home/mustafa/myfile_prc_new.txt", force=True)
with iRODSSession(irods_env_file=env_file) as session: session.data_objects.put("100MB.bin", "/u0137480/home/mustafa/100MB.bin")