Documentation for `on()` needs clarification

korydraughn commented 10 months ago

[x] clarify on() (irods_docs)
[ ] add note about if() to "Rule Condition" section (irods_docs)

Bug Report

iRODS Version, OS and Version

iRODS 4.3.0

What did you try to do?

The following rule code works in 4.2.12, but not in 4.3.0.

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT)
{
    on (*DATAOBJINP.destRescName like "ufs0") {
        *obj_path = *DATAOBJINP.obj_path;
        writeLine('serverLog', 'Entering pep_api_data_obj_put_post for *obj_path');
    }
}

Observed behavior (including steps to reproduce, if applicable)

Case 1

Here is a PUT without specifying a target resource.

$ echo here is a file. > foo
$ iput foo
remote addresses: X.X.X.X ERROR: putUtil: put error for /tempZone/home/kory/foo, status = -1101000 status = -1101000 RULE_FAILED_ERR

The iRODS server log file.

{                                                                                                                                                                                             
  "log_category": "legacy",                                                                                                                                                                   
  "log_level": "error",                                                                                                                                                                       
  "log_message": "unmatched key destRescName\nline 629, col 8, rule base core\n    ON( *DATAOBJINP.destRescName like \"ufs0\" ) {\n        ^\n\n",                                            
  "request_api_name": "DATA_OBJ_PUT_AN",                                                                                                                                                      
  "request_api_number": 606,                                                                                                                                                                  
  "request_api_version": "d",                                                                  
  "request_client_user": "kory",                                                               
  "request_host": "[REDACTED]",                                                               
  "request_proxy_user": "kory",                                                                
  "request_release_version": "rods4.3.0",                                                      
  "server_host": "[REDACTED]",                                                                     
  "server_pid": 1560602,                                                                       
  "server_timestamp": "2023-08-17T12:48:11.985Z",
  "server_type": "agent",                                                                      
  "server_zone": "tempZone"                                                                    
}                                                                                              
{                                                                                              
  "log_category": "rule_engine",                                                               
  "log_level": "debug",                                                                        
  "log_message": "post-pep rule [pep_api_data_obj_put_post] failed with error code [-1101000]",
  "request_api_name": "DATA_OBJ_PUT_AN",                                                       
  "request_api_number": 606,                                                                   
  "request_api_version": "d",                                                                  
  "request_client_user": "kory",                                                               
  "request_host": "[REDACTED]",                                                               
  "request_proxy_user": "kory",                                                                
  "request_release_version": "rods4.3.0",                                                      
  "server_host": "[REDACTED]",                                                                     
  "server_pid": 1560602,                                                                       
  "server_timestamp": "2023-08-17T12:48:11.985Z",
  "server_type": "agent",                                                                      
  "server_zone": "tempZone"                                                                    
}

Case 2

And here is a PUT that targets a resource which does NOT match the one defined in the PEP.

$ echo here is a file. > foo
$ iput foo -R demoResc
remote addresses: X.X.X.X ERROR: putUtil: put error for /tempZone/home/kory/foo, status = -1101000 status = -1101000 RULE_FAILED_ERR

The iRODS server log file.

{                                                                                              
  "log_category": "legacy",                                                                    
  "log_level": "error",                                                                        
  "log_message": "",                                                                           
  "request_api_name": "DATA_OBJ_PUT_AN",                                                       
  "request_api_number": 606,                                                                   
  "request_api_version": "d",                                                                  
  "request_client_user": "kory",                                                               
  "request_host": "[REDACTED]",                                                               
  "request_proxy_user": "kory",                                                                
  "request_release_version": "rods4.3.0",                                                      
  "server_host": "[REDACTED]",                                                                     
  "server_pid": 1560795,                                                                       
  "server_timestamp": "2023-08-17T12:48:45.459Z",
  "server_type": "agent",                                                                      
  "server_zone": "tempZone"                                                                    
}                                                                                              
{                                                                                              
  "log_category": "rule_engine",                                                               
  "log_level": "debug",                                                                        
  "log_message": "post-pep rule [pep_api_data_obj_put_post] failed with error code [-1101000]",
  "request_api_name": "DATA_OBJ_PUT_AN",                                                       
  "request_api_number": 606,                                                                   
  "request_api_version": "d",                                                                  
  "request_client_user": "kory",                                                               
  "request_host": "[REDACTED]",                                                               
  "request_proxy_user": "kory",                                                                
  "request_release_version": "rods4.3.0",                                                      
  "server_host": "[REDACTED]",                                                                     
  "server_pid": 1560795,                                                                       
  "server_timestamp": "2023-08-17T12:48:45.460Z",
  "server_type": "agent",                                                                      
  "server_zone": "tempZone"                                                                    
}

Case 3

Here is a PUT that targets the resource defined in the PEP.

$ echo here is a file. > foo
$ iput foo -R ufs0

This works as expected. No errors triggered, therefore the log file is quiet.

Originally reported at https://groups.google.com/g/iROD-Chat/c/42wjPvcu-sg.

alanking commented 10 months ago

Have we seen how this behaves in main/4-3-stable/soon-to-be-4.3.1? Just want to make sure it's not already fixed.

korydraughn commented 10 months ago

I reproduced the issue using a build that is close to 4.3.1.

trel commented 10 months ago

does using the if branching logic rather than on make this behave as expected?

korydraughn commented 10 months ago

It doesn't.

trel commented 10 months ago

okay. interesting.

wondering if it ever worked as expected - or maybe our error stack is doing a better job at getting the message back to the user/client. pretty sure we haven't touched the actual rule language logic.

korydraughn commented 10 months ago

Yeah. I feel our error handling may just be better than before.

FifthPotato commented 1 week ago

Back into the NREP code once more... This is caused by a combination of a few things:

Case 1: This is due to the fact that dataObjInp doesn't actually have a destRescName member by itself: it's part of a KeyValuePair that may or may not include it. One potential fix would be to add the default resource to the map (in irods_re_serialization.cpp) as destRescName always-- it breaks the pattern somewhat, but would fix this issue. Alternatively, is there something a rule-writer could do to check the object's members before trying to access it?

Case 2: This one is due to a...quirk in the NREP? I believe it's intentional, but any "condition" e.g. (on) that evaluates to false will explicitly fail. I failed to run into Case 2 when using if. This behavior (and error code) is caused by: https://github.com/irods/irods/blob/00a781fb098e0dd81f55dc56965c09573366d995/plugins/rule_engines/irods_rule_language/src/arithmetics.cpp#L1524 The preceding lines check the condition, and will cause this error code if the condition evaluates to false. Notably, it only logs a message if the body failed somewhere-- otherwise, it simply logs an empty error (as seen). This appears to bubble up. A fix would change the NREP behavior, which may be unwanted. An (admittedly awful and hacky) workaround would be to add a second condition block to the rule, which suppresses the error from bubbling up, e.g.

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT)
{
    on(*DATAOBJINP.destRescName == "ufs0") {
        *obj_path = *DATAOBJINP.obj_path;
        writeLine('serverLog', 'Entering pep_api_data_obj_put_post for *obj_path');
    }
        on(true) {}
}

which prevented Case 2 for me (an empty error message is still logged).

korydraughn commented 1 week ago

Case 1: This is due to the fact that dataObjInp doesn't actually have a destRescName member by itself: it's part of a KeyValuePair that may or may not include it. One potential fix would be to add the default resource to the map (in irods_re_serialization.cpp) as destRescName always-- it breaks the pattern somewhat, but would fix this issue. Alternatively, is there something a rule-writer could do to check the object's members before trying to access it?

Because certain members appear based on the state of the KeyValuePair, we can't change it. A workaround was provided in the google chat conversation.

pep_api_data_obj_put_post(*INST_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT)
{
    *resc_name = 'undefined';
    errorcode(msiGetValByKey(*DATAOBJINP, 'destRescName', *resc_name));
    writeLine('serverLog', 'resource name = [*resc_name]') # for debug purposes.

    if (*resc_name like 'eudatCache') {
        # Do work.
    }
}

So, we already provide the tool/MSI for this, it just requires the policy writer to know how things are derived (which isn't good at all).

I think this becomes a documentation exercise. We need to document what members are guaranteed to exist. Then we need to document all keywords which are known to potentially appear in the KeyValuePair across iRODS data types.

Case 2: This one is due to a...quirk in the NREP? I believe it's intentional, but any "condition" e.g. (on) that evaluates to false will explicitly fail. \<snip>

We need to document the behavior of on(...) {} better. Look through https://docs.irods.org/4.3.2/plugins/irods_rule_language/ and see if the documentation for on covers the things you learned.

Perhaps we should just deprecate the use of on?

trel commented 1 week ago

I would favor a deprecation of on if we don't have any known use cases / patterns that require it.

If the same outcome can always be achieved with an if, then we're safe to deprecate on.

korydraughn commented 1 week ago

If on somehow forced policy writers to cover all cases, then on would be more powerful/useful. But since it doesn't, deprecation sounds like the correct direction. The deprecation would be more about providing signal to prefer if and we'd simply leave support for on in the code to avoid breaking the NREP.

We'd also need to add documentation to docs.irods.org about the deprecation and preference for if.

trel commented 1 week ago

We previously updated documentation and some example rules to use if rather than on... but never left a deprecation message before.

korydraughn commented 1 week ago

I think this is a unique situation. While eventual removal of on would be preferred, I don't think we are ready to modify the NREP code in that way. The simplest thing to do IMO is to include recommendations/guidelines within the documentation.

trel commented 1 week ago

Agreed. That seems safe and good.

FifthPotato commented 1 week ago

We need to document the behavior of on(...) {} better. Look through https://docs.irods.org/4.3.2/plugins/irods_rule_language/ and see if the documentation for on covers the things you learned.

Yep, it actually is in there-- any rule condition that fails is expected to return a rule failure. Noted here...

...there are three failure conditions:
1. Rule condition evaluates to false.

So I think the documentation is there to some degree. Maybe it needs to be bolded some more?

trel commented 1 week ago

Well, there it is. So unless we have a bug in the implementation of on() itself... I think we're covered for now for case 2?

korydraughn commented 1 week ago

What happens if there are multiple on() conditions? Is it still the first on() condition that fails causes the rule to fail? (No way this is true)

I've opened issues for documenting guaranteed members vs members created due to KeyValuePairs.

Maybe we re-purpose this issue to expand the docs to cover what failure means in the presence of multiple on() conditions?

trel commented 1 week ago

I bet it cares about each one / all of them.

FifthPotato commented 1 week ago

Ran a few tests on my container. It seems like that as long as a single one of the on conditions evaluates to true, then it won't report the RULE_FAILED_ERR back. (I also couldn't get it to print the weird empty log message with multiple true/false on conditions mixed in, so maybe that was a red herring?)

korydraughn commented 1 week ago

Ran a few tests on my container. It seems like that as long as a single one of the on conditions evaluates to true, then it won't report the RULE_FAILED_ERR back.

Good. Please add a few words to the rule language section of irods/irods_docs so it's clear that behavior is supported. As they are today, the wording only hints at that. We want to be explicit about the behavior where possible.

Once the docs are updated, we can close this.

(I also couldn't get it to print the weird empty log message with multiple true/false on conditions mixed in, so maybe that was a red herring?)

Perhaps. We can ignore the empty log message since you haven't been able to reproduce that part.

alanking commented 5 hours ago

So is this just a documentation exercise? If so, it seems we are good to close.

korydraughn commented 4 hours ago

Yes. It's just documentation now.

@FifthPotato Please close if the work is complete.

irods / irods