irods / irods

Open Source Data Management Software
https://irods.org
BSD 3-Clause "New" or "Revised" License
447 stars 138 forks source link

Some clients encounter errors when querying a path that includes spaces, a single quotation mark, and the word ‘and’ #8033

Open mstfdkmn opened 4 days ago

mstfdkmn commented 4 days ago

Bug Report

iRODS Version, OS and Version

4.3.2, almalinux9 iCommands 4.3.2

4.3.3, almalinux9 iCommands 4.3.3 irods_client_globus_connector 4.3.3.0

What did you try to do?

We wanted to sync a dataset that includes spaces, single quotation marks, and the word ‘and’ in the path.

Expected behavior

We expect the transfer to complete without any issues. Probably the connector (and some other clients) cannot translate the path correctly.

Observed behavior (including steps to reproduce, if applicable)

The transfer repeatedly encounters the following errors.

_The full path (queried for checksum) in the screenshot as follows /set/home/SusMat/0archive/Michaël Lalanne-Tisné/Michael/Backup/Docs/Administration and info/Demande d'autorisation de d+®placement - original.docx

clinet side: image

errors in the irods logs:

[2024-11-15T10:07:44.692Z][icts-p-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "95", "channel": "stdout", "irods": {"log_category": "server", "log_level": "critical", "log_message": " 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.3\n 1# 0x00007F4BBEC0B6F0 in /lib64/libc.so.6\n 2# 0x00007F4BBBA33003 in /lib64/libzstd.so.1\n 3# 0x00007F4BBBA35326 in /lib64/libzstd.so.1\n 4# 0x00007F4BBBA2B3EF in /lib64/libzstd.so.1\n 5# 0x00007F4BBB9A882D in /lib64/libcrypt.so.2\n 6# 0x00007F4BBBA24C17 in /lib64/libzstd.so.1\n 7# irods::error irods::plugin_base::call<GenQueryInp*, GenQueryOut*>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, GenQueryInp*, GenQueryOut*)::'lambda'(irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, GenQueryInp*, GenQueryOut*)::operator()(irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, GenQueryInp*, GenQueryOut*) const in /lib/libirods_server.so.4.3.3\n 8# std::__1::__function::__func<irods::error irods::plugin_base::call<GenQueryInp*, GenQueryOut*>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, GenQueryInp*, GenQueryOut*)::'lambda'(irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, GenQueryInp*, GenQueryOut*), std::__1::allocator<irods::error irods::plugin_base::call<GenQueryInp*, GenQueryOut*>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, GenQueryInp*, GenQueryOut*)::'lambda'(irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, GenQueryInp*, GenQueryOut*)>, irods::error (irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, GenQueryInp*, GenQueryOut*)>::operator()(irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*&&, GenQueryInp*&&, GenQueryOut*&&) in /lib/libirods_server.so.4.3.3\n 9# irods::error irods::plugin_base::call<GenQueryInp*, GenQueryOut*>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, GenQueryInp*, GenQueryOut*) in /lib/libirods_server.so.4.3.3\n10# chlGenQuery(GenQueryInp, GenQueryOut*) in /lib/libirods_server.so.4.3.3\n11# _rsGenQuery(RsComm*, GenQueryInp*, GenQueryOut**) in /lib/libirods_server.so.4.3.3\n12# rsGenQuery(RsComm*, GenQueryInp*, GenQueryOut**) in /lib/libirods_server.so.4.3.3\n13# irods::api_call_adaptor<GenQueryInp*, GenQueryOut**>::operator()(irods::plugin_context&, RsComm*, GenQueryInp*, GenQueryOut**) in /lib/libirods_server.so.4.3.3\n14# std::__1::__function::__func<irods::api_call_adaptor<GenQueryInp*, GenQueryOut**>, std::__1::allocator<irods::api_call_adaptor<GenQueryInp*, GenQueryOut**> >, irods::error (irods::plugin_context&, RsComm*, GenQueryInp*, GenQueryOut**)>::operator()(irods::plugin_context&, RsComm*&&, GenQueryInp*&&, GenQueryOut**&&) in /lib/libirods_server.so.4.3.3\n15# int irods::api_entry::call_handler<GenQueryInp*, GenQueryOut**>(RsComm*, GenQueryInp*, GenQueryOut**) in /lib/libirods_server.so.4.3.3\n16# rsApiHandler(RsComm*, int, BytesBuf*, BytesBuf*) in /lib/libirods_server.so.4.3.3\n17# readAndProcClientMsg(RsComm*, int) in /lib/libirods_server.so.4.3.3\n18# agentMain(RsComm*) in /lib/libirods_server.so.4.3.3\n19# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.3\n20# main::$_5::operator()() const in /usr/sbin/irodsServer\n21# main in /usr/sbin/irodsServer\n22# 0x00007F4BBEBF6590 in /lib64/libc.so.6\n23# __libc_start_main in /lib64/libc.so.6\n24# _start in /usr/sbin/irodsServer\n", "server_host": "set.irods.icts.kuleuven.be", "server_pid": 95, "server_timestamp": "2024-11-15T10:07:44.691Z", "server_type": "server", "server_zone": "set", "stacktrace_agent_pid": "3154035", "stacktrace_timestamp_epoch_milliseconds": "638", "stacktrace_timestamp_epoch_seconds": "1731665258", "stacktrace_timestamp_utc": "2024-11-15T10:07:38.638Z"}}
[2024-11-15T10:07:58.421Z][icts-p-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "95", "channel": "stdout", "irods": {"log_category": "legacy", "log_level": "error", "log_message": "[-]\t/irods_source/server/core/src/rsApiHandler.cpp:565:int readAndProcClientMsg(rsComm_t *, int) :  status [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']\n\t[-]\t/irods_source/lib/core/src/sockComm.cpp:229:irods::error readMsgHeader(irods::network_object_ptr, msgHeader_t *, struct timeval *) :  status [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']\n\t\t[-]\t/irods_source/plugins/network/src/tcp.cpp:190:irods::error tcp_read_msg_header(irods::plugin_context &, void *, struct timeval *) :  status [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [only read [0] of [4]]\n\n", "request_api_name": "OBJ_STAT_AN", "request_api_number": 633, "request_api_version": "d", "request_client_user": "u0092172", "request_host": "127.0.0.1", "request_proxy_user": "globus", "request_release_version": "rods4.3.3", "server_host": "set.irods.icts.kuleuven.be", "server_pid": 3154170, "server_timestamp": "2024-11-15T10:07:58.421Z", "server_type": "agent", "server_zone": "set"}}
[2024-11-15T10:07:58.422Z][icts-p-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "95", "channel": "stdout", "irods": {"log_category": "agent", "log_level": "error", "log_message": "Agent [3154170] exiting with status = -4000", "request_api_name": "OBJ_STAT_AN", "request_api_number": 633, "request_api_version": "d", "request_client_user": "u0092172", "request_host": "127.0.0.1", "request_proxy_user": "globus", "request_release_version": "rods4.3.3", "server_host": "set.irods.icts.kuleuven.be", "server_pid": 3154170, "server_timestamp": "2024-11-15T10:07:58.422Z", "server_type": "agent", "server_zone": "set"}}
[2024-11-15T10:07:58.436Z][icts-p-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "95", "channel": "stdout", "irods": {"log_category": "agent_factory", "log_level": "error", "log_message": "Agent process [3154170] exited with status [1].", "server_host": "set.irods.icts.kuleuven.be", "server_pid": 116, "server_timestamp": "2024-11-15T10:07:58.436Z", "server_type": "agent_factory", "server_zone": "set"}}

To reproduce this issue by iCommands:

imkdir "colA and colB"
icd "colA and colB"
touch "myfile'"
iput "myfile'"
ils "myfile'"

querying the path with ils fails:

[irods@icts-t-dev-u0137480-irods ~]$ ils "/test/home/rods/colA and colB/myfile'"
iRODS exception occurred: [INPUT_ARG_NOT_WELL_FORMED_ERR: query fill failed for [select DATA_ID where DATA_NAME = 'myfile'' and COLL_NAME = '/test/home/rods/colA and colB']

]

Another look:

[irods@icts-t-dev-u0137480-irods ~]$ ipwd
/test/home/rods/colA and colB
[irods@icts-t-dev-u0137480-irods ~]$ ils
/test/home/rods/colA and colB:
  myfile'
[irods@icts-t-dev-u0137480-irods ~]$ ils "myfile'"
iRODS exception occurred: [INPUT_ARG_NOT_WELL_FORMED_ERR: query fill failed for [select DATA_ID where DATA_NAME = 'myfile'' and COLL_NAME = '/test/home/rods/colA and colB']

]

without "and" works (another collection that is named with spaces and contains the same object):

[irods@icts-t-dev-u0137480-irods ~]$ ils "/test/home/rods/my cool coll/myfile'"
  /test/home/rods/my cool coll/myfile'

iquest query is working too:

[irods@icts-t-dev-u0137480-irods ~]$ iquest "select DATA_NAME where COLL_NAME = '/test/home/rods/colA and colB' and DATA_NAME = 'myfile''"
DATA_NAME = myfile'
------------------------------------------------------------

We suspect the globus issue might lie in the construction of the query string or the query itself, as indicated in the following code: https://github.com/irods/irods_client_globus_connector/blob/main/DSI/globus_gridftp_server_iRODS.cpp#L1572-L1578.

alanking commented 4 days ago

I believe this is at least related to if not a subset of the following: https://github.com/irods/irods/issues/3902 https://github.com/irods/irods/issues/4983 https://github.com/irods/irods/issues/5992 https://github.com/irods/irods/issues/7302

A similar issue was also reported (by Ingrid!) in the Globus Connector before and was fixed: https://github.com/irods/irods_client_globus_connector/issues/101 But apparently did not fix this situation...

So this situation is only triggered when all three conditions are met? That is, the path must include at least one space, a single quotation mark, and the word "and"? Or is the problem present with only a subset of these?

We have a fix for the overall GenQuery1 parser problem in this PR: #7819 I would bet that the issue is resolved by that change. We have held off on introducing it to the stable branch because it will affect all queries inside and outside of the system and could introduce unforeseen instability.

mstfdkmn commented 1 day ago

A similar issue was also reported (by Ingrid!) in the Globus Connector before and was fixed: https://github.com/irods/irods_client_globus_connector/issues/101 But apparently did not fix this situation...

Yes, they might be similar, but it seems the error code on the client side is different.

So this situation is only triggered when all three conditions are met? That is, the path must include at least one space, a single quotation mark, and the word "and"? Or is the problem present with only a subset of these?

Yes, that is correct. This situation is only triggered when all three conditions are met: without ‘and’ with others, without a single quotation mark with others, and without a space. Please see the new test results below (from another client machine and iRODS):

root@eb8c3a92a56e:/# ils "myfile'"
  /icts/home/u0137480/colA plus colB/myfile'

root@eb8c3a92a56e:/# ils myfile
  /icts/home/u0137480/colA and colB/myfile

root@eb8c3a92a56e:/# ils "myfile'"
  /icts/home/u0137480/test/myfile'

But please note that when I try to reproduce the issue with a single file under the three conditions, I encounter the same client error with ‘iput’ too. Steps:

root@eb8c3a92a56e:/# ipwd
/icts/home/u0137480/test
touch "myfile' name and id"
root@eb8c3a92a56e:/# iput "myfile' name and id"
remote addresses: 10.114.48.180 ERROR: putUtil: put error for /icts/home/u0137480/test/myfile' name and id, status = -1107000 status = -1107000 NO_COLUMN_NAME_FOUND

irods logs:

[2024-11-18T10:01:46.293Z][icts-t-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "96", "channel": "stdout", "irods": {"log_category": "legacy", "log_level": "error", "log_message": "[create_new_replica:536] - [NO_COLUMN_NAME_FOUND: query fill failed for [select DATA_ID, DATA_REPL_NUM where COLL_NAME = '/icts/home/u0137480/test' and DATA_NAME = 'myfile' name and id' and DATA_RESC_HIER = 'default;netapp']\n\n] [error_code=[-1107000], path=[/icts/home/u0137480/test/myfile' name and id], hierarchy=[default;netapp]", "request_api_name": "DATA_OBJ_PUT_AN", "request_api_number": 606, "request_api_version": "d", "request_client_user": "u0137480", "request_host": "127.0.0.1", "request_proxy_user": "u0137480", "request_release_version": "rods4.3.3", "server_host": "icts.irods.t.icts.kuleuven.be", "server_pid": 293740, "server_timestamp": "2024-11-18T10:01:46.292Z", "server_type": "agent", "server_zone": "icts"}}
[2024-11-18T10:01:46.294Z][icts-t-cloud-rdm-hev-1] {"processname": "irods", "groupname": "irods", "pid": "96", "channel": "stdout", "irods": {"log_category": "legacy", "log_level": "error", "log_message": "[single_buffer_put:267] - failed to open data object [error_code=[-1107000], path=[/icts/home/u0137480/test/myfile' name and id]]", "request_api_name": "DATA_OBJ_PUT_AN", "request_api_number": 606, "request_api_version": "d", "request_client_user": "u0137480", "request_host": "127.0.0.1", "request_proxy_user": "u0137480", "request_release_version": "rods4.3.3", "server_host": "icts.irods.t.icts.kuleuven.be", "server_pid": 293740, "server_timestamp": "2024-11-18T10:01:46.292Z", "server_type": "agent", "server_zone": "icts"}}

The file is uploaded. If I want to list file (not its parent col):

root@eb8c3a92a56e:/# ils "myfile' name and id"
iRODS exception occurred: [NO_COLUMN_NAME_FOUND: query fill failed for [select DATA_USER_NAME, DATA_ZONE_NAME, DATA_ACCESS_NAME, USER_TYPE where COLL_NAME = '/icts/home/u0137480/test' and DATA_NAME = 'myfile' name and id' and DATA_TOKEN_NAMESPACE = 'access_type']

]

If I want to reupload, the iput command hangs: root@eb8c3a92a56e:/# iput -f "myfile' name and id"