Open mstfdkmn opened 3 months ago
The original logs in that thread (https://github.com/irods/irods_client_globus_connector/issues/84#issuecomment-2043145273) do show two CLOSEs for the same PID... which is surprising.
Tested this again...I am observing that pep_api_data_obj_close_post
is invoked only once for the transfers that are done without checksum enabled. For the transfers with the checksum enabled pep_api_data_obj_close_post
is first fired after the data is written/closed in irods and the second time fired when the Globus connection completed (checksum completed). So hypothetically thinking; might it be the case that for checksuming it could be opening/closing the object again?
Ah, good observation. I think your assessment is correct. Checksums are calculated in this project via open/read/close, as seen here: https://github.com/irods/irods_client_globus_connector/blob/6be085cad8530f8ec5c29ca1006c90a3d1633917/DSI/globus_gridftp_server_iRODS.cpp#L1647-L1683
Took that snippet from this issue, which is related: https://github.com/irods/irods_client_globus_connector/issues/102
Does that fully explain this, then?
This explains the additional OPEN as well. Yes?
I think it does, yes. I defer to @JustinKyleJames
pep_api_data_obj_close_post
Yes, that would explain it.
Excellent. Should we link this issue in the README update in #107 as well? Feels like it is explained in there.
I think so.
That is, if I could follow all correctly, it is not possible to "read" for checksumming during open/write/close when checksum is enabled. Is this correct? I am wondering because if this could have been possible, we would always see only one time the policy is fired but for the case checksum enabled it would be fired a bit delayed (after everything/upload completed).
That is, if I could follow all correctly, it is not possible to "read" for checksumming during open/write/close when checksum is enabled. Is this correct? I am wondering because if this could have been possible, we would always see only one time the policy is fired but for the case checksum enabled it would be fired a bit delayed (after everything/upload completed).
It might be possible but I am not sure we get the bytes in the exact sequence from the client. In addition the writing is fanned out to multiple threads. I would have to investigate whether it is possible under those circumstances to calculate the checksum on the fly.
We should investigate whether there's a checksum algorithm that works for out-of-order reads/writes?
We have a PEP rule that should be triggered by the globus writes in order to implement a policy (extract metadata and attach them to the object with/in a specific schema). What I observe is that
pep_api_data_obj_close_post
is triggered 2 times when a 3,2 Gb file is transferred to irods. We have a flat resource hierarchy btw.Looks like related to this. So the close pep is somehow called two times.