irods / irods_client_globus_connector

The iRODS Globus Connector
2 stars 4 forks source link

Transferring an object to/in iRODS throws `-808000` and `-358000` #65

Closed mstfdkmn closed 6 months ago

mstfdkmn commented 1 year ago

Bug Report

iRODS Version, OS and Version

4.3.0 almalinux8

What did you try to do?

Expected behavior

Transfers to be completed successfully and and without seeing any error in iRODS logs

Observed behavior (including steps to reproduce, if applicable)

Transfers are completed successfully (data objects have & status in irods) but with an error in the logs of iRODS

[2023-06-01T14:25:26.454Z][icts-p-cloud-rdm-hev-2] irods stdout | {"log_category":"legacy","log_facility":"local0","log_level":"error","log_message":"[rsDataObjOpen_impl:904] - [OBJ_PATH_DOES_NOT_EXIST: Data object or replica does not exist [error_code=-808000, path=/ghum/home/u0137480/my_test_file.txt].\n\n] [error_code=[-358000], path=[/ghum/home/u0137480/my_test_file.txt], hierarchy=[]","request_api_name":"DATA_OBJ_OPEN_AN","request_api_number":602,"request_api_version":"d","request_client_user":"u0137480","request_host":"127.0.0.1","request_proxy_user":"globus","request_release_version":"rods4.3.0","server_host":"ghum.irods.icts.kuleuven.be","server_pid":167725,"server_timestamp":"2023-06-01T14:25:26.454Z","server_type":"agent"}

If I interpret it correctly, on each transfer we see errors because of this.

Could you let us know whether this is something that we should be worried about?

alanking commented 1 year ago

I don't know anything about Globus, but I'll add that the error message appearing in the log is coming from these lines: https://github.com/irods/irods/blob/f6eb6c72786288878706e2562a370b91b7d0802e/server/api/src/rsDataObjOpen.cpp#L767-L783

Something happened when attempting to resolve a resource hierarchy for the given operation. It seems like the Globus connector may not be using the right open flags or something for certain situations. Probably warrants investigation.

mstfdkmn commented 7 months ago

Hi @JustinKyleJames, would it be possible to prioritize this? Because we do a lot of transfers to irods via the connector and due to this hard to see a clean irods log - (any file transfer to irods throws that error). Thanks.

korydraughn commented 7 months ago

Yes, we'll put eyes on it.

JustinKyleJames commented 6 months ago

I have no been able to reproduce this problem using globus-url-copy nor with ftp. However, I think I know the root of the problem.

When the plugin's globus_l_gfs_iRODS_recv method gets called the open flag is first set to O_WRONLY. Then if the truncate option is set the flags for O_CREAT and O_TRUNC are set.

With the ways I know to test transfers, the truncate flag is always set so the O_CREAT flag is always set. I am guessing that when this error occurs the truncate flag is not set so O_CREAT does not get set. If the file does not exist we get an error.

I did verify that the error is generated if I update the code to not set the O_CREAT.

I think the solution is to set the open mode for the first thread to both O_CREAT and O_WRONLY. I may have to get @mstfdkmn to help test this.

korydraughn commented 6 months ago

@mstfdkmn Are you open to testing the solution in PR #99 (in a testing environment)?

mstfdkmn commented 6 months ago

Yes, we are thinking to test it soon (we don't have a test endpoint for the globus connector so might require time for possible challenges).

korydraughn commented 6 months ago

No problem. Let us know how it goes.

trel commented 6 months ago

@mstfdkmn a fix has been merged. closing so we can get this into the next release.

if it's a problem still / again, please open a new issue and reference this one.

mstfdkmn commented 6 months ago

Great! Thanks. We definitely let you know if needed.

mstfdkmn commented 6 months ago

Seems this fix didnt resolve the issue. We built the connector from its source and integrated to our production end-points and I should say that we do still see errors mentioned above in our irods logs.

mstfdkmn commented 6 months ago

Sorry was an early alarm! I tested against different zones' endpoints and I dont see any error anymore. My initial test transfer seem to have coincidentally resulted in at the same time with an ongoing transfer (I guess old processes are not refreshed/cleaned when the connector built version is changed) and I guess that is why I supposed that I saw again the same errors.

korydraughn commented 6 months ago

So you're saying the change seems to have resolved the issue?

mstfdkmn commented 6 months ago

Yes. But what I dont understand is something else I guess: that is, I do still see the same errors for some transfers that I didn't initiate and I am guessing these are old transfers. If I know more, I will let you know.