irods / irods_rule_engine_plugin_python

BSD 3-Clause "New" or "Revised" License
10 stars 14 forks source link

Enable rule engine on Ubuntu 22 and IRODS 4.3.1 #200

Open DOC-MEX opened 3 months ago

DOC-MEX commented 3 months ago

I'm trying to enable the python rule engine on a fresh installation of Irods 4.3.1 in an Ubuntu 22 machine. I am using the core.py.templete as my core.py, the moment I enable the python engine in the server_config.json file. I get these errors. They stop as soon as I remove the python rule engine stanza from the config file.

 {"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on irods-development-2024:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"irods-development-2024","server_pid":1956231,"server_timestamp":"2024-04-10T10:31:58.554Z","server_type":"server","server_zone":"zone_C"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [1967232] exited with status [11].","server_host":"irods-development-2024","server_pid":1956232,"server_timestamp":"2024-04-10T10:31:58.570Z","server_type":"agent_factory","server_zone":"zone_C"}
 {"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::__1::__function::__func<initServerMain(RsComm*, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm*, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007FE19B5CFAC3 in /lib/x86_64-linux-gnu/libc.so.6\n 8# 0x00007FE19B661850 in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"irods-development-2024","server_pid":1956231,"server_timestamp":"2024-04-10T10:31:58.640Z","server_type":"server","server_zone":"zone_C"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007FE19B57D520 in /lib/x86_64-linux-gnu/libc.so.6\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals**/clang-runtime13.0.0-0**/lib/libc++.so.1\n 3# 0x00007FE1970D6B41\n 4# 0x00007FE1970C8764\n 5# 0x00007FE1966AB64C\n 6# 0x00007FE19711A080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib**/libirods_server.so.4.3.1**\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# main::$_5::operator()() const at rodsServer.cpp:?\n10# main in /usr/sbin/irodsServer\n11# 0x00007FE19B564D90 in /lib/x86_64-linux-gnu/libc.so.6\n12# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n13# _start in /usr/sbin/irodsServer\n","server_host":"irods-development-2024","server_pid":1956231,"server_timestamp":"2024-04-10T10:32:00.833Z","server_type":"server","server_zone":"zone_C","stacktrace_agent_pid":"1967232","stacktrace_timestamp_epoch_milliseconds":"552","stacktrace_timestamp_epoch_seconds":"1712745118","stacktrace_timestamp_utc":"2024-04-10T10:31:58.552Z"}

any idea what could the problem? I installed the python rule engine using apt-get. Could it be a compatibility issue related to the newest IRODS version and the the python rule engine?

trel commented 3 months ago

You have only one machine? irods-development-2024:1247?

I wonder if you've found an inconsistency with core.py.template and the default core.re... We'll try to reproduce.

DOC-MEX commented 3 months ago

I have test it successfully on another machine with Ubuntu 20, this is the first time I try on an Ubuntu 22.

I did change update core.py since this time I have used a different name for my resource, in core.re I have this:

acSetRescSchemeForCreate {msiSetDefaultResc("grassrootsResc","null"); }

So in core.py I also used that name on this function:

def acSetRescSchemeForCreate(rule_args, callback, rei):
    with open('../../etc/irods/server_config.json') as f:
        server_config_dict = json.load(f)
    callback.msiSetDefaultResc(server_config_dict.get('default_resource_name', 'grassrootsResc'), 'null')

I also used a simpler core.py from the beginners tutorial which only contains this:

#core.py
import sys
import exifread

from genquery import *

def pep_api_data_obj_put_post(rule_args, callback, rei):
    import os

    data_obj_inp = rule_args[2]
    obj_path = str(data_obj_inp.objPath)
    resc_hier = str(data_obj_inp.condInput['resc_hier'])
    query_condition_string = f'COLL_NAME = \'{os.path.dirname(obj_path)}\' and '  \
                             f'DATA_NAME = \'{os.path.basename(obj_path)}\' and ' \
                             f'DATA_RESC_HIER = \'{resc_hier}\''

    phypath = list(Query(callback, 'DATA_PATH', query_condition_string))[0]

    exiflist = []
    with open(phypath, 'rb') as f:
        tags = exifread.process_file(f, details=False)
        for (k, v) in tags.items():
            if k not in ('JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote'):
                exifpair = '{0}={1}'.format(k, v)
                exiflist.append(exifpair)
    exifstring = '%'.join(exiflist)
    callback.add_metadata_to_objpath(exifstring, obj_path, '-d')
    callback.writeLine('serverLog', 'PYTHON - pep_api_data_obj_put_post() complete')

But It also triggers the same error.

trel commented 3 months ago

I was able to stand up 4.3.1 with PREP and core.py.template as core.py without issue.

You can look at $ iadmin get_delay_server_info and $ iadmin h set_delay_server to see if those are somehow related.

alanking commented 3 months ago

"Agent process [1967232] exited with status [11]."

Please confirm that your core.py file is based on the /etc/irods/core.py.template shipped with the plugin package OR that you have the irods_rule_engine_plugin-cpp_default_policy REP configured.

So, you will either need an item that looks like this in your configured rule engine plugins:

            {
                "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
                "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
                "plugin_specific_configuration": {}
            }

...or, you will need to ensure that every required static PEP is implemented in core.py.

Some background: The irods_rule_engine_plugin-cpp_default_policy REP (or, "cpp backstop" as we call it) makes sure that all of the static PEPs used internally in the server are implemented. If some static PEP is not implemented in any of the configured rulebases, you can see a signal 11.

DOC-MEX commented 2 months ago

Yes, I am testing with the core.py.template that came with the installation of the plugin (everything I have installed has been using apt-get) as my core.py without any changes. The moment I add the stanza to the server_config file

{
    "instance_name": "irods_rule_engine_plugin-python-instance",
    "plugin_name": "irods_rule_engine_plugin-python",
    "plugin_specific_configuration": {}
}

But that immediately triggers an exit with status 11 and SYS_HEADER_READ_LEN_ERR error.

This is the output of _get_delay_serverinfo and _iadmin h set_delayserver and the error logs.

ienv
irods_version - 4.3.1
irods_session_environment_file - /home/olverad/.irods/irods_environment.json.2050314
irods_environment_file - /home/olverad/.irods/irods_environment.json
irods_host - irods-development-2024
irods_port - 1247
irods_zone_name - zone_C
irods_user_name - test
olverad@irods-development-2024:/etc/irods$ iadmin get_delay_server_info
{
    "leader": "irods-development-2024",
    "successor": ""
}

olverad@irods-development-2024:/etc/irods$ iadmin h set_delay_server
set_delay_server HOSTNAME

Set the delay server for the local zone in R_GRID_CONFIGURATION.

The hostname entered will be saved as the 'successor'.

Each iRODS server will periodically check the catalog to determine
if it should promote itself to be the delay server for the local zone.

This mechanism allows for graceful delay server migration without downtime.

Error logs:

{"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on irods-development-2024:1247 is probably down status = -**4000 SYS_HEADER_READ_LEN_ERR**","server_host":"irods-development-2024","server_pid":2051529,"server_timestamp":"2024-04-11T07:29:11.191Z","server_type":"server","server_zone":"zone_C"}
{"log_category":"agent_factory","log_level":"error","log_message":"Agent process [2051627] **exited with status [11].**","server_host":"irods-development-2024","server_pid":2051530,"server_timestamp":"2024-04-11T07:29:11.211Z","server_type":"agent_factory","server_zone":"zone_C"}
{"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::__1::__function::__func<initServerMain(RsComm*, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm*, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007FF50E2B8AC3 in /lib/x86_64-linux-gnu/libc.so.6\n 8# 0x00007FF50E34A850 in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"irods-development-2024","server_pid":2051529,"server_timestamp":"2024-04-11T07:29:11.269Z","server_type":"server","server_zone":"zone_C"}
DOC-MEX commented 2 months ago

I have reinstalled irods on an Ubuntu 20 machine. This time I have used all the default values (zone and resource names, etc) But now I am having the same problem when trying to enable the python rule engine! The moment I add it the the server_config file, I get this errors:

{"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on grassroots-frontend-test:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"grassroots-frontend-test","server_pid":3262840,"server_timestamp":"2024-04-12T15:45:57.460Z","server_type":"server","server_zone":"tempZone"} {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [3263253] exited with status [11].","server_host":"grassroots-frontend-test","server_pid":3262841,"server_timestamp":"2024-04-12T15:45:57.469Z","server_type":"agent_factory","server_zone":"tempZone"} {"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n file: /irods_source/lib/core/src/client_connection.cpp\n function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n line: 188\n code: -305000 (USER_SOCK_CONNECT_ERR)\n message:\n Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::1::function::func<initServerMain(RsComm, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007F03E0D99609 in /lib/x86_64-linux-gnu/libpthread.so.0\n 8# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"grassroots-frontend-test","server_pid":3262840,"server_timestamp":"2024-04-12T15:45:57.522Z","server_type":"server","server_zone":"tempZone"}

and this when I try to ils command:

remote addresses: 127.0.1.1 ERROR: [-] /irods_source/lib/core/src/sockComm.cpp:863:int connectToRhost(rcComm_t *, int, int) : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [failed to call 'read header'] [-] /irods_source/lib/core/src/sockComm.cpp:611:irods::error readVersion(irods::network_object_ptr, version_t *) : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [failed to call 'read header'] [-] /irods_source/lib/core/src/sockComm.cpp:229:irods::error readMsgHeader(irods::network_object_ptr, msgHeader_t , struct timeval ) : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [failed to call 'read header'] [-] /irods_source/plugins/network/src/tcp.cpp:190:irods::error tcp_read_msg_header(irods::plugin_context &, void , struct timeval *) : status [SYS_HEADER_READ_LEN_ERR] errno [] -- message [only read [0] of [4]]

ERROR: _rcConnect: connectToRhost error, server on grassroots-frontend-test:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR

And when I remove it from the server_config file, everything is back to normal:

sudo su irods -c "/var/lib/irods/irodsctl status" irodsServer : Process 3262840 Process 3262841 irodsDelayServer : Process 3262858 grassroots-frontend-test:~$ ienv irods_version - 4.3.1 irods_session_environment_file - /home/olverad/.irods/irods_environment.json.3262346 irods_environment_file - /home/olverad/.irods/irods_environment.json irods_host - grassroots-frontend-test irods_port - 1247 irods_zone_name - tempZone irods_user_name - rods

grassroots-frontend-test:~$ ils /tempZone/home/rods:

I have no idea where to start looking for a possible problem, I thought it was because I was using Ubuntu 22, but now it also fails in Ubuntu 20.

DOC-MEX commented 2 months ago

@trel could you confirm what are the basics steps just for enabling the python rule engine on a fresh IRODS installation?

  1. install it: sudo apt-get -y install irods-rule-engine-plugin-python python3-exif
  2. sudo cp core.py.template core.py
  3. Edit rule_engine list in server_config.json

That should be all, correct? I have reinstalled irods 4.3.1 in an Ubuntu 20 machine, with all the default values, and as you can see it works perfectly fine.

sudo su irods -c "/var/lib/irods/irodsctl status" irodsServer : Process 2736 Process 2737 irodsDelayServer : Process 2754

olverac@opendata-20:~$ hostname opendata-20

olverac@opendata-20:~$ iadmin get_delay_server_info { "leader": "opendata-20", "successor": "" }

olverac@opendata-20:~$ ienv irods_version - 4.3.1 irods_session_environment_file - /home/olverac/.irods/irods_environment.json.17899 irods_environment_file - /home/olverac/.irods/irods_environment.json irods_host - opendata-20 irods_port - 1247 irods_zone_name - tempZone irods_user_name - rods

However, the moment I add the stanza to the config file: rule_engines": [ { "instance_name": "irods_rule_engine_plugin-python-instance", "plugin_name": "irods_rule_engine_plugin-python", "plugin_specific_configuration": {} }, {

It triggers this error:

{"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on opendata-20:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"opendata-20","server_pid":2736,"server_timestamp":"2024-04-17T13:27:18.618Z","server_type":"server","server_zone":"tempZone"}

{"log_category":"agent_factory","log_level":"error","log_message":"Agent process [18273] exited with status [11].","server_host":"opendata-20","server_pid":2737,"server_timestamp":"2024-04-17T13:27:18.638Z","server_type":"agent_factory","server_zone":"tempZone"}

{"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n file: /irods_source/lib/core/src/client_connection.cpp\n function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n line: 188\n code: -305000 (USER_SOCK_CONNECT_ERR)\n message:\n Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# 0x000000000044C167 in /usr/sbin/irodsServer\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# 0x000000000044AF21 in /usr/sbin/irodsServer\n 7# 0x00007F582755C609 in /lib/x86_64-linux-gnu/libpthread.so.0\n 8# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"opendata-20","server_pid":2736,"server_timestamp":"2024-04-17T13:27:18.647Z","server_type":"server","server_zone":"tempZone"}

What other configuration might I be missing?

korydraughn commented 2 months ago

Using the irods_testing_environment, I was able to stand up a new iRODS 4.3.1 server with the python REP and an empty core.py file in an Ubuntu 22.04 container without issue. The logs are quiet.

Please post the following.

If the config files contain sensitive information, please replace it or mask it.

DOC-MEX commented 2 months ago

Hi Kory here are the requested files:

server_config.json irods_environment.json error_logs.json

the value returned by hostname in the terminal is opendata-20

DOC-MEX commented 2 months ago

Another question, why I am able to use irods if I get this output:

sudo systemctl status irods ● irods.service - LSB: Manages the iRODS service Loaded: loaded (/etc/init.d/irods; generated) Active: failed (Result: exit-code) since Thu 2024-04-18 11:57:11 UTC; 48s ago Docs: man:systemd-sysv-generator(8) Process: 103984 ExecStart=/etc/init.d/irods start (code=exited, status=1/FAILURE)

Apr 18 11:57:10 opendata-20 systemd[1]: Starting LSB: Manages the iRODS service... Apr 18 11:57:10 opendata-20 su[103998]: (to irods) root on none Apr 18 11:57:10 opendata-20 su[103998]: pam_unix(su-l:session): session opened for user irods by (uid> Apr 18 11:57:11 opendata-20 su[103998]: pam_unix(su-l:session): session closed for user irods Apr 18 11:57:11 opendata-20 systemd[1]: irods.service: Control process exited, code=exited, status=1/> Apr 18 11:57:11 opendata-20 systemd[1]: irods.service: Failed with result 'exit-code'. Apr 18 11:57:11 opendata-20 systemd[1]: Failed to start LSB: Manages the iRODS service.

but this one seems OK sudo su irods -c "/var/lib/irods/irodsctl status" irodsServer : Process 2736 Process 2737 irodsDelayServer : Process 2754

but I cannot restart the irods service: sudo systemctl start irods Job for irods.service failed because the control process exited with error code.

korydraughn commented 2 months ago

iRODS doesn't ship an official systemd unit file yet. I recommend using the irodsctl script directly. You can't go wrong or get confused using that.

The log file only contains the error that you reported in this issue. Can you include the activity beginning with the initial launch of the server upto the failures (with and without the python REP)? I want to see everything leading up to (and including) the errors.

DOC-MEX commented 2 months ago

Here are the logs when I stop irodsctl, then start it and when I add the python rule engine stanza

full_error_logs.json

korydraughn commented 2 months ago

The irods_environment.json file you provided is missing several pieces of information. Is that the file used by the iRODS service account (i.e. /var/lib/irods/.irods/irods_environment.json)?

korydraughn commented 2 months ago

Also, please list the contents of the /etc/irods directory and /etc/irods/core.py.

I don't see the python REP listed in the server_config.json file you posted. Please post the server_config.json which causes the error.

Have you tried standing up things on a different computer? VM? Container?

DOC-MEX commented 2 months ago

sorry I gave you the wrong irods_environment.json, this is the one!

sudo cat .irods/irods_environment.json { "irods_client_server_negotiation": "request_server_negotiation", "irods_client_server_policy": "CS_NEG_REFUSE", "irods_connection_pool_refresh_time_in_seconds": 300, "irods_cwd": "/tempZone/home/rods", "irods_default_hash_scheme": "SHA256", "irods_default_number_of_transfer_threads": 4, "irods_default_resource": "demoResc", "irods_encryption_algorithm": "AES-256-CBC", "irods_encryption_key_size": 32, "irods_encryption_num_hash_rounds": 16, "irods_encryption_salt_size": 8, "irods_home": "/tempZone/home/rods", "irods_host": "opendata-20", "irods_match_hash_policy": "compatible", "irods_maximum_size_for_single_buffer_in_megabytes": 32, "irods_port": 1247, "irods_server_control_plane_encryption_algorithm": "AES-256-CBC", "irods_server_control_plane_encryption_num_hash_rounds": 16, "irods_server_control_plane_key": "32_byte_server_control_plane_key", "irods_server_control_plane_port": 1248, "irods_transfer_buffer_size_for_parallel_transfer_in_megabytes": 4, "irods_user_name": "rods", "irods_zone_name": "tempZone", "schema_name": "service_account_environment", "schema_version": "v4" }

The stanza I add in server_config.json that triggers the error is this one

"rule_engines": [ { "instance_name": "irods_rule_engine_plugin-python-instance", "plugin_name": "irods_rule_engine_plugin-python", "plugin_specific_configuration": {} }, {

the content of /etc/irods : /etc/irods$ ls core.dvm core.py core.re server_config.json session_vars.py core.fnm core.py.template genquery.py service_account.config

DOC-MEX commented 2 months ago

and this is the core.py file I am using:

import datetime
import json

def writeStringToCharArray(s, char_array):
    for i in range(0, len(s)):
        char_array[i] = s[i]

def pythonRuleEnginePluginTest(rule_args, callback, rei):
    with open('/tmp/from_core_py.txt', 'a') as f:
        f.write(str(datetime.datetime.now()))
        f.write('\n')
        c = 0
        for arg in rule_args:
            f.write('\t')
            f.write(str(c))
            f.write(' : ')
            f.write(str(arg))
            f.write('\n')
            c = c +1
    callback.writeLine('serverLog', 'Printed to server log from python rule engine')

def acPreConnect(rule_args, callback, rei):
    rule_args[0] = 'CS_NEG_REFUSE'

def acCreateUser(rule_args, callback, rei):
    ret = callback.msiCreateUser()

    if not ret['status']:
        callback.msiRollback()
        return ret

    ret = acCreateDefaultCollections(rule_args, callback, rei)

    if not ret['status']:
        callback.msiRollback()
        return ret

    ret = callback.msiAddUserToGroup('public')

    if not ret['status']:
        callback.msiRollback()
        return ret

    callback.msiCommit()

def acCreateDefaultCollections(rule_args, callback, rei):
    ret = acCreateUserZoneCollections(rule_args, callback, rei)
    return ret

def acCreateUserZoneCollections(rule_args, callback, rei):
    rodsZoneProxy = str(rei.uoip.rodsZone)
    otherUserName = str(rei.uoio.userName)
    homeStr = '/'.join(['', rodsZoneProxy, 'home'])
    trashStr = '/'.join(['', rodsZoneProxy, 'trash', 'home'])

    homeDict = [homeStr, otherUserName]
    trashDict = [trashStr, otherUserName]

    ret = acCreateCollByAdmin(homeDict, callback, rei)
    ret = acCreateCollByAdmin(trashDict, callback, rei)
    return ret

def acCreateCollByAdmin(rule_args, callback, rei):
    ret = callback.msiCreateCollByAdmin(rule_args[0], rule_args[1])
    return ret

def acDeleteUser(rule_args, callback, rei):
    ret = callback.acDeleteDefaultCollections()

    if not ret['status']:
        callback.msiRollback()
        return ret

    ret = callback.msiDeleteUser()

    if not ret['status']:
        callback.msiRollback()
        return ret

    callback.msiCommit()

def acDeleteDefaultCollections(rule_args, callback, rei):
    return acDeleteUserZoneCollections(rule_args, callback, rei)

def acDeleteUserZoneCollections(rule_args, callback, rei):
    rodsZoneProxy = str(rei.uoip.rodsZone)
    otherUserName = str(rei.uoio.userName)
    homeStr = '/'.join(['', rodsZoneProxy, 'home'])
    trashStr = '/'.join(['', rodsZoneProxy, 'trash', 'home'])

    homeDict = [homeStr, otherUserName]
    trashDict = [trashStr, otherUserName]

    ret = acDeleteCollByAdminIfPresent(homeDict, callback, rei)
    ret = acDeleteCollByAdminIfPresent(trashDict, callback, rei)
    return ret

def acDeleteCollByAdminIfPresent(rule_args, callback, rei):
    ret = callback.msiDeleteCollByAdmin(rule_args[0], rule_args[1])

    if not ret['status']:
        if not ret['code'] == -808000:
            callback.failmsg(ret['code'], 'error deleting collection')
    return ret

def acDeleteCollByAdmin(rule_args, callback, rei):
    callback.msiDeleteCollByAdmin(rule_args[0], rule_args[1])

def acRenameLocalZone(rule_args, callback, rei):
    coll_name = '/' + rule_args[0]

    ret = callback.msiRenameCollection(coll_name, rule_args[1])

    if not ret['status']:
        callback.msiRollback()
        return ret

    ret = callback.msiRenameLocalZone(rule_args[0], rule_args[1])

    if not ret['status']:
        callback.msiRollback()
        return ret

    callback.msiCommit()

def acGetUserByDN(rule_args, callback, rei):
    pass

def acAclPolicy(rule_args, callback, rei):
    callback.msiAclPolicy('STRICT')

def acTicketPolicy(rule_args, callback, rei):
    pass

def acCheckPasswordStrength(rule_args, callback, rei):
    pass

def acSetRescSchemeForCreate(rule_args, callback, rei):
    with open('../../etc/irods/server_config.json') as f:
        server_config_dict = json.load(f)
    callback.msiSetDefaultResc(server_config_dict.get('default_resource_name', 'demoResc'), 'null')

def acSetRescSchemeForRepl(rule_args, callback, rei):
    with open('../../etc/irods/server_config.json') as f:
        server_config_dict = json.load(f)
    callback.msiSetDefaultResc(server_config_dict.get('default_resource_name', 'demoResc'), 'null')

def acPreprocForDataObjOpen(rule_args, callback, rei):
    pass

def acSetMultiReplPerResc(rule_args, callback, rei):
    pass

def acPostProcForPut(rule_args, callback, rei):
    #pythonRuleEnginePluginTest(rule_args, callback, rei)
    pass

def acPostProcForCopy(rule_args, callback, rei):
    pass

def acPostProcForFilePathReg(rule_args, callback, rei):
    pass

def acPostProcForCreate(rule_args, callback, rei):
    pass

def acPostProcForOpen(rule_args, callback, rei):
    pass

def acPostProcForPhymv(rule_args, callback, rei):
    pass

def acPostProcForRepl(rule_args, callback, rei):
    pass

def acSetNumThreads(rule_args, callback, rei):
    callback.msiSetNumThreads('default', 'default', 'default')

def acDataDeletePolicy(rule_args, callback, rei):
    pass

def acPostProcForDelete(rule_args, callback, rei):
    pass

def acSetChkFilePathPerm(rule_args, callback, rei):
    callback.msiSetChkFilePathPerm('disallowPathReg')

def acTrashPolicy(rule_args, callback, rei):
    pass

def acSetPublicUserPolicy(rule_args, callback, rei):
    pass

def acChkHostAccessControl(rule_args, callback, rei):
    pass

def acSetVaultPathPolicy(rule_args, callback, rei):
    callback.msiSetGraftPathScheme('no', '1')

def acSetReServerNumProc(rule_args, callback, rei):
    callback.msiSetReServerNumProc('default')

def acPreprocForCollCreate(rule_args, callback, rei):
    pass

def acPostProcForCollCreate(rule_args, callback, rei):
    pass

def acPreprocForRmColl(rule_args, callback, rei):
    pass

def acPostProcForRmColl(rule_args, callback, rei):
    pass

def acPreProcForModifyUser(rule_args, callback, rei):
    pass

def acPostProcForModifyUser(rule_args, callback, rei):
    pass

def acPreProcForModifyAVUMetadata(rule_args, callback, rei):
    pass

def acPostProcForModifyAVUMetadata(rule_args, callback, rei):
    pass

def acPreProcForCreateUser(rule_args, callback, rei):
    pass

def acPostProcForCreateUser(rule_args, callback, rei):
    pass

def acPreProcForDeleteUser(rule_args, callback, rei):
    pass

def acPostProcForDeleteUser(rule_args, callback, rei):
    pass

def acPreProcForCreateResource(rule_args, callback, rei):
    pass

def acPostProcForCreateResource(rule_args, callback, rei):
    pass

def acPreProcForCreateToken(rule_args, callback, rei):
    pass

def acPostProcForCreateToken(rule_args, callback, rei):
    pass

def acPreProcForModifyUserGroup(rule_args, callback, rei):
    pass

def acPostProcForModifyUserGroup(rule_args, callback, rei):
    pass

def acPreProcForDeleteResource(rule_args, callback, rei):
    pass

def acPostProcForDeleteResource(rule_args, callback, rei):
    pass

def acPreProcForDeleteToken(rule_args, callback, rei):
    pass

def acPostProcForDeleteToken(rule_args, callback, rei):
    pass

def acPreProcForModifyResource(rule_args, callback, rei):
    pass

def acPostProcForModifyResource(rule_args, callback, rei):
    pass

def acPreProcForModifyCollMeta(rule_args, callback, rei):
    pass

def acPostProcForModifyCollMeta(rule_args, callback, rei):
    pass

def acPreProcForModifyDataObjMeta(rule_args, callback, rei):
    pass

def acPostProcForModifyDataObjMeta(rule_args, callback, rei):
    pass

def acPreProcForModifyAccessControl(rule_args, callback, rei):
    pass

def acPostProcForModifyAccessControl(rule_args, callback, rei):
    pass

def acPreProcForObjRename(rule_args, callback, rei):
    pass

def acPostProcForObjRename(rule_args, callback, rei):
    pass

def acPreProcForGenQuery(rule_args, callback, rei):
    pass

def acPostProcForGenQuery(rule_args, callback, rei):
    pass

def acRescQuotaPolicy(rule_args, callback, rei):
    callback.msiSetRescQuotaPolicy('off')

def acBulkPutPostProcPolicy(rule_args, callback, rei):
    callback.msiSetBulkPutPostProcPolicy('off')

def acPostProcForTarFileReg(rule_args, callback, rei):
    pass

def acPostProcForDataObjWrite(rule_args, callback, rei):
    pass

def acPostProcForDataObjRead(rule_args, callback, rei):
    pass

def acPreProcForExecCmd(rule_args, callback, rei):
    pass

def acPreProcForServerPortal(rule_args, callback, rei):
    pass

def acPostProcForServerPortal(rule_args, callback, rei):
    pass

def acPostProcForDataCopyReceived(rule_args, callback, rei):
    pass
DOC-MEX commented 2 months ago

I have tried on 2 VMs with Ubuntu 20 and Ubuntu 22 and I get the same error. However, on previous installation a few months ago, it works with no problem. The problem seems to occur only on new installations.

korydraughn commented 2 months ago

Ok. Nothing stands out to me in your server_config.json or irods_environment.json file upon initial inspection.

Where did you add the Python REP in the rule engines stanza? Is it first in the list? Are the removing the iRODS rule language REP and default cpp REP or leaving them in? Have you seen the issue with an empty core.py?

DOC-MEX commented 2 months ago

Yes, I add the python REP first on the list, like this:

"rule_engines": [
                {
                "instance_name": "irods_rule_engine_plugin-python-instance",
                "plugin_name": "irods_rule_engine_plugin-python",
                "plugin_specific_configuration": {}
            },
            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
                "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                "plugin_specific_configuration": {
                    "re_data_variable_mapping_set": [
                        "core"
                    ],
                    "re_function_name_mapping_set": [
                        "core"
                    ],
                    "re_rulebase_set": [
                        "core",
                        "training"
                    ],
                    "regexes_for_supported_peps": [
                        "ac[^ ]*",
                        "msi[^ ]*",
                        "[^ ]*pep_[^ ]*_(pre|post|except|finally)"
                    ]
                },
                "shared_memory_instance": "irods_rule_language_rule_engine"
            },
            {
                "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
                "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
                "plugin_specific_configuration": {}
            }
        ]
    },

then, to stop to the error messages and bring back irods service I only remove the Python REP and leave of the rest rule_engines as they were by default.

I get same error messages if I use an empty core.py

 {"log_category":"delay_server","log_level":"error","log_message":"iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# main in irodsDelayServer\n 4# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n 5# _start in irodsDelayServer\n\n","server_host":"opendata-20","server_pid":106891,"server_timestamp":"2024-04-18T14:59:54.952Z","server_type":"delay_server","server_zone":"tempZone"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [116483] exited with status [11].","server_host":"opendata-20","server_pid":106874,"server_timestamp":"2024-04-18T14:59:54.968Z","server_type":"agent_factory","server_zone":"tempZone"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007FFA8721D420 in /lib/x86_64-linux-gnu/libpthread.so.0\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals/clang-runtime13.0.0-0/lib/libc++.so.1\n 3# 0x00007FFA82C75B41\n 4# 0x00007FFA82C67764\n 5# 0x00007FFA822DE64C\n 6# 0x00007FFA82CB9080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib/libirods_server.so.4.3.1\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# 0x00000000004429C6 in /usr/sbin/irodsServer\n10# main in /usr/sbin/irodsServer\n11# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n12# _start in /usr/sbin/irodsServer\n","server_host":"opendata-20","server_pid":106873,"server_timestamp":"2024-04-18T14:59:57.742Z","server_type":"server","server_zone":"tempZone","stacktrace_agent_pid":"116483","stacktrace_timestamp_epoch_milliseconds":"938","stacktrace_timestamp_epoch_seconds":"1713452394","stacktrace_timestamp_utc":"2024-04-18T14:59:54.938Z"}
alanking commented 2 months ago

Here are a few other things we can check...

Ensure that the iRODS Linux service account is able to read the core.py file. I noticed that you said you use sudo cp when copying the template file. Here's the output on an instance of iRODS 4.3.1 on an Ubuntu 22 container that I'm using (I think I did a sudo cp as well, evidenced by the owner of core.py):

$ ls -l /etc/irods
total 116
drwxrwxr-x 2 irods irods  4096 Feb 28 17:35 __pycache__
-rw-r--r-- 1 irods irods  5026 Feb 28 17:33 core.dvm
-rw-r--r-- 1 irods irods   831 Feb 28 17:33 core.fnm
-rw-r--r-- 1 root  root     29 Feb 28 17:35 core.py
-rw-r--r-- 1 irods irods  8606 Oct  5  2023 core.py.template
-rw-r--r-- 1 irods irods 33386 Feb 28 17:33 core.re
-rw-r--r-- 1 irods irods 17656 Oct  5  2023 genquery.py
-rw------- 1 irods irods  4736 Feb 28 17:35 server_config.json
-rw-r--r-- 1 irods irods    64 Feb 28 17:33 service_account.config
-rw-r--r-- 1 irods irods  4876 Nov 29  2018 session_vars.py

Another thing I just noticed is that your listing of /etc/irods was missing genquery.py, which is supposed to be packaged with Python REP: https://github.com/irods/irods_rule_engine_plugin_python/blob/ad836782d6bac3eec4276aa6c649c12de42dc3b9/CMakeLists.txt#L258-L264

This made me wonder the packages are being installed correctly. Can you confirm that the Python REP shared objects exist and have the correct permissions? Here's an example from an installation I have on an Ubuntu 22 container running iRODS 4.3.1...

$ ls -l /usr/lib/irods/plugins/rule_engines/
total 16052
-rw-r--r-- 1 root root  1137192 Oct 20 19:59 libirods_rule_engine_plugin-cpp_default_policy.so
-rw-r--r-- 1 root root  1590200 Oct 20 19:59 libirods_rule_engine_plugin-irods_rule_language.so
-rw-r--r-- 1 root root   528560 Oct 20 19:59 libirods_rule_engine_plugin-passthrough.so
-rw-r--r-- 1 root root 13169144 Oct 20 20:58 libirods_rule_engine_plugin-python.so

If all of this seems okay on your end, then we may have to start digging into how these VMs are being set up. You said it used to work a few months ago, but now it doesn't for new installations. We will need to think about what may have changed between now and then.

DOC-MEX commented 2 months ago
ls -l /usr/lib/irods/plugins/rule_engines/
total 16052
-rw-r--r-- 1 root root  1137192 Oct 20 19:59 libirods_rule_engine_plugin-cpp_default_policy.so
-rw-r--r-- 1 root root  1590200 Oct 20 19:59 libirods_rule_engine_plugin-irods_rule_language.so
-rw-r--r-- 1 root root   528560 Oct 20 19:59 libirods_rule_engine_plugin-passthrough.so
-rw-r--r-- 1 root root 13169144 Oct 20 20:58 libirods_rule_engine_plugin-python.so

And actually the file genquery.py is not missing, it was added when I installed the python rule engine. However, in /etc/irods there were some files that belong to root. I have changed all ownership to the irods user, this is what I have now:

$ ls -tr -all
total 112
-rw-r--r--   1 irods irods  4876 Nov 29  2018 session_vars.py
-rw-r--r--   1 irods irods 17656 Oct  5  2023 genquery.py
-rw-r--r--   1 irods irods  8606 Oct  5  2023 core.py.template
-rw-r--r--   1 irods irods    64 Apr 17 09:42 service_account.config
-rw-r--r--   1 irods irods   831 Apr 17 09:43 core.fnm
-rw-r--r--   1 irods irods  5026 Apr 17 09:43 core.dvm
-rw-r--r--   1 irods irods   154 Apr 18 16:03 example.re
-rw-r--r--   1 irods irods 33490 Apr 18 16:22 core.re
-rw-r--r--   1 irods irods     0 Apr 19 07:55 core.py
drwxr-xr-x 106 root  root   4096 Apr 19 08:07 ..
-rw-------   1 irods irods  4792 Apr 19 08:15 server_config.json
drwxr-xr-x   2 irods irods  4096 Apr 19 08:19 .

I have checked VM with the previous installation where the python rule engine works and I can see that all files in /etc/irods/ belong to the irods user as well.

/etc/irods$ ls -tr -all
total 144
-rw-r--r--   1 irods irods  4876 Nov 29  2018 session_vars.py
-rw-r--r--   1 irods irods    64 Sep  6  2023 service_account.config
-rw-r--r--   1 irods irods   831 Sep  6  2023 core.fnm
-rw-r--r--   1 irods irods  5026 Sep  6  2023 core.dvm
-rw-r--r--   1 irods irods 33378 Sep  6  2023 core.re
-rw-r--r--   1 irods irods   243 Sep 18  2023 training.re
-rw-r--r--   1 irods irods   154 Sep 18  2023 example.re
-rw-r--r--   1 irods irods 12325 Sep 25  2023 core.py
-rw-r--r--   1 irods irods 17656 Oct  5  2023 genquery.py
-rw-r--r--   1 irods irods  8606 Oct  5  2023 core.py.template
drwxrwxr-x   2 irods irods  4096 Feb  7 15:55 __pycache__
-rw-------   1 irods irods  5081 Mar 19 14:30 server_config.json
drwxr-xr-x 122 root  root  12288 Apr 17 06:28 ..
drwxr-xr-x   3 irods irods  4096 Apr 19 08:18 .

Unfortunately, I get the same errors when I add python stanza, even when I use an empty core.py

 {"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on opendata-20:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"opendata-20","server_pid":183861,"server_timestamp":"2024-04-19T08:15:03.337Z","server_type":"server","server_zone":"tempZone"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [184391] exited with status [11].","server_host":"opendata-20","server_pid":183862,"server_timestamp":"2024-04-19T08:15:03.352Z","server_type":"agent_factory","server_zone":"tempZone"}

I wonder what other permission and folders should I check? or a simple test to know that the python rule engine was installed correctly and has everything it needs, maybe it is something related to the the python interpreter?

alanking commented 2 months ago

Oh, my apologies. Now I see genquery.py in your output from before. Sorry :)

Try removing the training rulebase from the iRODS rule language REP. Here's what your rule_engines configuration should look like verbatim:

        "rule_engines": [
            {
                "instance_name": "irods_rule_engine_plugin-python-instance",
                "plugin_name": "irods_rule_engine_plugin-python",
                "plugin_specific_configuration": {
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
                "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                "plugin_specific_configuration": {
                    "re_data_variable_mapping_set": [
                        "core"
                    ],
                    "re_function_name_mapping_set": [
                        "core"
                    ],
                    "re_rulebase_set": [
                        "core"
                    ],
                    "regexes_for_supported_peps": [
                        "ac[^ ]*",
                        "msi[^ ]*",
                        "[^ ]*pep_[^ ]*_(pre|post|except|finally)"
                    ]
                },
                "shared_memory_instance": "irods_rule_language_rule_engine"
            },
            {
                "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
                "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
                "plugin_specific_configuration": {}
            }
        ]

That is truly the default, vanilla, not-particularly-useful configuration. From there, we can try adding some functionality if that holds stable.

An additional thing to try so that we can feel out the edges of the problem: Remove /etc/irods/core.py while leaving the Python REP configured as shown above (again, verbatim) and try to run ils. You should get the following error:

$ ils
remote addresses: 127.0.0.1 ERROR: connectToRhost: error returned from host localhost status = -38000 SYS_AGENT_INIT_ERR
 ERROR: _rcConnect: connectToRhost error, server on localhost:1247 is probably down status = -38000 SYS_AGENT_INIT_ERR

We can also try to increase the log_level of agent, server, and rule_engine to debug or even trace and see if the server log has any additional information before the crash occurs. I don't expect much, though.

DOC-MEX commented 2 months ago

These are the logs that I get when I increase the level to "debug" and enabled the python rule engine without having the /etc/irods/core.py

{"log_category":"agent","log_level":"debug","log_message":"Unpacking byte buffer based on packing instruction [GridConfigurationInp_PI]","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79543,"server_timestamp":"2024-04-25T09:18:39.353Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"[close_all_l1_descriptors:949] Closing all L1 descriptors ...","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79543,"server_timestamp":"2024-04-25T09:18:39.354Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"Agent [79543] exiting with status = 0","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79543,"server_timestamp":"2024-04-25T09:18:39.355Z","server_type":"agent","server_zone":"earlhamZone"}
 {"dynamic_pep_rule_regex":"[^ ]*pep_[^ ]*_(pre|post)","instance_name":"irods_rule_engine_plugin-python-instance","log_category":"rule_engine","log_level":"debug","log_message":"No regexes found in server_config for Python RE - using default regexes","rule_engine_plugin":"python","server_host":"opendata-20","server_pid":79547,"server_timestamp":"2024-04-25T09:18:44.472Z","server_type":"agent","server_zone":"earlhamZone","static_pep_rule_regex":"ac[^ ]*"}
 {"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on opendata-20:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:44.479Z","server_type":"server","server_zone":"earlhamZone"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [79547] exited with status [11].","server_host":"opendata-20","server_pid":4163,"server_timestamp":"2024-04-25T09:18:44.500Z","server_type":"agent_factory","server_zone":"earlhamZone"}
 {"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::__1::__function::__func<initServerMain(RsComm*, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm*, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007F1A2F87A609 in /lib/x86_64-linux-gnu/libpthread.so.0\n 8# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:44.543Z","server_type":"server","server_zone":"earlhamZone"}
 {"dynamic_pep_rule_regex":"[^ ]*pep_[^ ]*_(pre|post)","instance_name":"irods_rule_engine_plugin-python-instance","log_category":"rule_engine","log_level":"debug","log_message":"No regexes found in server_config for Python RE - using default regexes","rule_engine_plugin":"python","server_host":"opendata-20","server_pid":79561,"server_timestamp":"2024-04-25T09:18:49.649Z","server_type":"agent","server_zone":"earlhamZone","static_pep_rule_regex":"ac[^ ]*"}
 {"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on opendata-20:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:49.655Z","server_type":"server","server_zone":"earlhamZone"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [79561] exited with status [11].","server_host":"opendata-20","server_pid":4163,"server_timestamp":"2024-04-25T09:18:49.684Z","server_type":"agent_factory","server_zone":"earlhamZone"}
 {"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::__1::__function::__func<initServerMain(RsComm*, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm*, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007F1A2F87A609 in /lib/x86_64-linux-gnu/libpthread.so.0\n 8# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:49.724Z","server_type":"server","server_zone":"earlhamZone"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007F1A2F886420 in /lib/x86_64-linux-gnu/libpthread.so.0\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals/clang-runtime13.0.0-0/lib/libc++.so.1\n 3# 0x00007F1A2B2DEB41\n 4# 0x00007F1A2B2D0764\n 5# 0x00007F1A2A94764C\n 6# 0x00007F1A2B322080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib/libirods_server.so.4.3.1\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# main::$_5::operator()() const at rodsServer.cpp:?\n10# main in /usr/sbin/irodsServer\n11# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n12# _start in /usr/sbin/irodsServer\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:51.475Z","server_type":"server","server_zone":"earlhamZone","stacktrace_agent_pid":"79561","stacktrace_timestamp_epoch_milliseconds":"654","stacktrace_timestamp_epoch_seconds":"1714036729","stacktrace_timestamp_utc":"2024-04-25T09:18:49.654Z"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007F1A2F886420 in /lib/x86_64-linux-gnu/libpthread.so.0\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals/clang-runtime13.0.0-0/lib/libc++.so.1\n 3# 0x00007F1A2B2DEB41\n 4# 0x00007F1A2B2D0764\n 5# 0x00007F1A2A94764C\n 6# 0x00007F1A2B322080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib/libirods_server.so.4.3.1\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# main::$_5::operator()() const at rodsServer.cpp:?\n10# main in /usr/sbin/irodsServer\n11# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n12# _start in /usr/sbin/irodsServer\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T09:18:51.585Z","server_type":"server","server_zone":"earlhamZone","stacktrace_agent_pid":"79547","stacktrace_timestamp_epoch_milliseconds":"477","stacktrace_timestamp_epoch_seconds":"1714036724","stacktrace_timestamp_utc":"2024-04-25T09:18:44.477Z"}
 {"log_category":"agent","log_level":"debug","log_message":"Unpacking byte buffer based on packing instruction [BinBytesBuf_PI]","request_api_name":"","request_api_number":110000,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.926Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"Unpacking byte buffer based on packing instruction [GridConfigurationInp_PI]","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.953Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"[close_all_l1_descriptors:949] Closing all L1 descriptors ...","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.954Z","server_type":"agent","server_zone":"earlhamZone"}
DOC-MEX commented 2 months ago

Changing the log level of agent to "debug" keeps printing these logs:

{"log_category":"agent","log_level":"debug","log_message":"Unpacking byte buffer based on packing instruction [BinBytesBuf_PI]","request_api_name":"","request_api_number":110000,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.926Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"Unpacking byte buffer based on packing instruction [GridConfigurationInp_PI]","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.953Z","server_type":"agent","server_zone":"earlhamZone"}
 {"log_category":"agent","log_level":"debug","log_message":"[close_all_l1_descriptors:949] Closing all L1 descriptors ...","request_api_name":"","request_api_number":20009,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"opendata-20","server_pid":79613,"server_timestamp":"2024-04-25T09:18:54.954Z","server_type":"agent","server_zone":"earlhamZone"}

So I keep it to agent in info level.
These are the logs when I copy the core.py.template to core.py. As you can see, the errors are still the same

{"dynamic_pep_rule_regex":"[^ ]*pep_[^ ]*_(pre|post)","instance_name":"irods_rule_engine_plugin-python-instance","log_category":"rule_engine","log_level":"debug","log_message":"No regexes found in server_config for Python RE - using default regexes","rule_engine_plugin":"python","server_host":"opendata-20","server_pid":83698,"server_timestamp":"2024-04-25T10:21:18.224Z","server_type":"agent","server_zone":"earlhamZone","static_pep_rule_regex":"ac[^ ]*"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [83698] exited with status [11].","server_host":"opendata-20","server_pid":4163,"server_timestamp":"2024-04-25T10:21:18.246Z","server_type":"agent_factory","server_zone":"earlhamZone"}
 {"log_category":"delay_server","log_level":"error","log_message":"iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# main in irodsDelayServer\n 4# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n 5# _start in irodsDelayServer\n\n","server_host":"opendata-20","server_pid":4181,"server_timestamp":"2024-04-25T10:21:18.271Z","server_type":"delay_server","server_zone":"earlhamZone"}
 {"dynamic_pep_rule_regex":"[^ ]*pep_[^ ]*_(pre|post)","instance_name":"irods_rule_engine_plugin-python-instance","log_category":"rule_engine","log_level":"debug","log_message":"No regexes found in server_config for Python RE - using default regexes","rule_engine_plugin":"python","server_host":"opendata-20","server_pid":83706,"server_timestamp":"2024-04-25T10:21:18.961Z","server_type":"agent","server_zone":"earlhamZone","static_pep_rule_regex":"ac[^ ]*"}
 {"log_category":"legacy","log_level":"error","log_message":"_rcConnect: connectToRhost error, server on opendata-20:1247 is probably down status = -4000 SYS_HEADER_READ_LEN_ERR","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T10:21:18.968Z","server_type":"server","server_zone":"earlhamZone"}
 {"log_category":"agent_factory","log_level":"error","log_message":"Agent process [83706] exited with status [11].","server_host":"opendata-20","server_pid":4163,"server_timestamp":"2024-04-25T10:21:18.990Z","server_type":"agent_factory","server_zone":"earlhamZone"}
 {"log_category":"server","log_level":"error","log_message":"Caught exception in migrate_delay_server(): iRODS Exception:\n    file: /irods_source/lib/core/src/client_connection.cpp\n    function: void irods::experimental::client_connection::only_connect(const std::string &, const int, const irods::experimental::fully_qualified_username &)\n    line: 188\n    code: -305000 (USER_SOCK_CONNECT_ERR)\n    message:\n        Connect error\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# std::__1::__function::__func<initServerMain(RsComm*, bool, bool)::$_13, std::__1::allocator<initServerMain(RsComm*, bool, bool)::$_13>, void ()>::operator()() at rodsServer.cpp:?\n 4# irods::experimental::cron::cron_task::operator()() in /usr/sbin/irodsServer\n 5# irods::experimental::cron::cron::run() in /usr/sbin/irodsServer\n 6# void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_10> >(void*) at rodsServer.cpp:?\n 7# 0x00007F1A2F87A609 in /lib/x86_64-linux-gnu/libpthread.so.0\n 8# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T10:21:19.032Z","server_type":"server","server_zone":"earlhamZone"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007F1A2F886420 in /lib/x86_64-linux-gnu/libpthread.so.0\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals/clang-runtime13.0.0-0/lib/libc++.so.1\n 3# 0x00007F1A2B2DEB41\n 4# 0x00007F1A2B2D0764\n 5# 0x00007F1A2A94764C\n 6# 0x00007F1A2B322080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib/libirods_server.so.4.3.1\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# main::$_5::operator()() const at rodsServer.cpp:?\n10# main in /usr/sbin/irodsServer\n11# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n12# _start in /usr/sbin/irodsServer\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T10:21:20.392Z","server_type":"server","server_zone":"earlhamZone","stacktrace_agent_pid":"83698","stacktrace_timestamp_epoch_milliseconds":"231","stacktrace_timestamp_epoch_seconds":"1714040478","stacktrace_timestamp_utc":"2024-04-25T10:21:18.231Z"}
 {"log_category":"server","log_level":"critical","log_message":" 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.1\n 1# 0x00007F1A2F886420 in /lib/x86_64-linux-gnu/libpthread.so.0\n 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /opt/irods-externals/clang-runtime13.0.0-0/lib/libc++.so.1\n 3# 0x00007F1A2B2DEB41\n 4# 0x00007F1A2B2D0764\n 5# 0x00007F1A2A94764C\n 6# 0x00007F1A2B322080\n 7# irods::pluggable_rule_engine<std::__1::tuple<> >::start_operation(std::__1::tuple<>&) in /lib/libirods_server.so.4.3.1\n 8# runIrodsAgentFactory(sockaddr_un) in /lib/libirods_server.so.4.3.1\n 9# main::$_5::operator()() const at rodsServer.cpp:?\n10# main in /usr/sbin/irodsServer\n11# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6\n12# _start in /usr/sbin/irodsServer\n","server_host":"opendata-20","server_pid":4162,"server_timestamp":"2024-04-25T10:21:20.535Z","server_type":"server","server_zone":"earlhamZone","stacktrace_agent_pid":"83706","stacktrace_timestamp_epoch_milliseconds":"966","stacktrace_timestamp_epoch_seconds":"1714040478","stacktrace_timestamp_utc":"2024-04-25T10:21:18.966Z"}