irods / irods_rule_engine_plugin_logical_quotas

BSD 3-Clause "New" or "Revised" License
1 stars 9 forks source link

Logical quotas and replication- wrong total_number_of_data_objects on replication #48

Closed simstoica closed 2 years ago

simstoica commented 2 years ago

In the logs I have the following error messages:

Nov 11 15:24:53 pid:943 NOTICE: writeLine: inString = Submitting rule to checksum replica 1 of /rugZone/home/user/test.txt
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: [logical_quotas] error => [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: 'r'
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: Rule Engine Plugin returned [-349000].
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: Rule Engine Plugin returned [-13000].
Nov 11 15:24:53 pid:943 NOTICE: writeLine: inString = Submitting rule to checksum replica 0 of /rugZone/home/user/test.txt
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: [logical_quotas] error => [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: 'r'
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: Rule Engine Plugin returned [-349000].
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: Rule Engine Plugin returned [-13000].
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: Logical Quotas Policy: Insufficient privileges
Nov 11 15:24:53 pid:943 remote addresses: 10.3.17.11, 10.3.17.20 ERROR: iRODS Exception:
    file: /irods_rule_engine_plugin_logical_quotas/src/handler.cpp
    function: auto irods::handler::pep_api_data_obj_close::post(const std::string &, const irods::instance_configuration_map &, std::list<boost::any> &, MsParamArray *, irods::callback &)::(anonymous class)::operator()(type-parameter-0-0 &, const type-parameter-0-1 &) const
    line: 1333
    code: -1205000 (RE_RUNTIME_ERROR)
    message:
        [-] /irods_rule_engine_plugin_logical_quotas/src/handler.cpp:614:irods::error irods::handler::logical_quotas_count_total_number_of_data_objects(const std::string &, const irods::instance_configuration_map &, std::list<boost::any> &, MsParamArray *, irods::callback &) :  status [RE_RUNTIME_ERROR]  errno [] -- message [Logical Quotas Policy: Insufficient privileges]

The logical quota policy is and the max number of objects was not exceeded:

$ irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_start_monitoring_collection", "collection": "/rugZone/home/user"}' null ruleExecOut
$ irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_set_maximum_number_of_data_objects", "collection": "/rugZone/home/user", "value": "15"}' null ruleExecOut

The metadata guard plugin is installed and active. Metadata guard policy is :

$ imeta ls -C /rugZone
AVUs defined for collection /rugZone:
attribute: irods::metadata_guard
value: {"prefixes": ["irods::"], "admin_only": true}
units:
korydraughn commented 2 years ago

the irods::logical_quotas::total_number_of_data_objects would increase by 1 when actually 2 objects have been created (2 replicas).

Q. Are you saying two data objects were created or that one data object with two replicas was created? Q. Is the user who triggered this error a rodsadmin? Q. When you say irods user, are you referring to the iRODS service account (i.e. rods user)?

Keep in mind that the plugin tracks the logical space (data objects), not the physical (replicas).

Aside from that, I noticed the rule engine errors just before the logical quotas error. It looks like you're running this in a delay rule. To avoid lines such as ERROR: Rule Engine Plugin returned [-349000]., you need to target a specific rule engine plugin using the delay hints. This is done by adding <INST_NAME>XXX</INST_NAME> to the delay hints. XXX must be replaced with a value from irule -a. For example:

delay("<PLUSET>60s</PLUSET><INST_NAME>irods_rule_engine_plugin-irods_rule_language-instance</INST_NAME>") {
    # Policy / Code
}
simstoica commented 2 years ago

the irods::logical_quotas::total_number_of_data_objects would increase by 1 when actually 2 objects have been created (2 replicas).

Q. Are you saying two data objects were created or that one data object with two replicas was created?

As a rodsuesr have created 1 data object with 2 replicas. The object itself is being counted by the logical quota plugin, the replica of the object not.

Q. Is the user who triggered this error a rodsadmin?

No. The user is a regular rodsuser. The action he performed is iput

Q. When you say irods user, are you referring to the iRODS service account (i.e. rods user)?

In this case I am referring to the irodsadmin

Keep in mind that the plugin tracks the logical space (data objects), not the physical (replicas).

If that is the case, than there is an inconsistency wrt the recalculation. When I recalculate the logical quotas values, these objects do get counted and their allocated space as well.

Aside from that, I noticed the rule engine errors just before the logical quotas error. It looks like you're running this in a delay rule. To avoid lines such as ERROR: Rule Engine Plugin returned [-349000]., you need to target a specific rule engine plugin using the delay hints. This is done by adding <INST_NAME>XXX</INST_NAME> to the delay hints. XXX must be replaced with a value from irule -a. For example:

delay("<PLUSET>60s</PLUSET><INST_NAME>irods_rule_engine_plugin-irods_rule_language-instance</INST_NAME>") {
    # Policy / Code
}

Thanks for the hint. I will look up that part to see what triggers the delayed engine rules errors.

korydraughn commented 2 years ago

In this case I am referring to the irodsadmin

I'm assuming you're referring to the iRODS user that the service account uses (e.g. on a vanilla install, the iRODS user used by the service account is rods. This can be changed during setup.).

Keep in mind that the plugin tracks the logical space (data objects), not the physical (replicas).

If that is the case, than there is an inconsistency wrt the recalculation. When I recalculate the logical quotas values, these objects do get counted and their allocated space as well.

So you're saying the plugin is counting replicas instead of data objects? The recalculation should produce the correct totals at that moment, however, depending on how active your users are, the totals will drift. Is it possible that we are experiencing some drift?

Please share your OS, iRODS plugin version, and anything else about your environment that may help in reproducing this.

simstoica commented 2 years ago

I tried reproducing the issue, but was partially unsuccessful. However, I still see an inconsistency between the iput and the recalculate.

my OS : ubuntu 18 irods version 4.2.10 plugins list :

There is one icat and 3 replicas, the replication is done on 2 of the 3 replicas.

Dec 14 11:58:38 pid:701 NOTICE: rsGeneralAdmin add child "mnt_irods000{}" to resource "pt000"
Dec 14 11:58:39 pid:709 NOTICE: rsGeneralAdmin add child "mnt_irods001{}" to resource "pt001"
Dec 14 11:58:39 pid:717 NOTICE: rsGeneralAdmin add child "mnt_irods100{}" to resource "pt100"
Dec 14 11:58:39 pid:725 NOTICE: rsGeneralAdmin add child "mnt_irods101{}" to resource "pt101"
Dec 14 11:58:40 pid:737 NOTICE: rsGeneralAdmin add child "mnt_irods200{}" to resource "pt200"
Dec 14 11:58:40 pid:745 NOTICE: rsGeneralAdmin add child "mnt_irods201{}" to resource "pt201"
Dec 14 11:58:40 pid:753 NOTICE: rsGeneralAdmin add child "pt000{}" to resource "randy00"
Dec 14 11:58:40 pid:761 NOTICE: rsGeneralAdmin add child "pt010{}" to resource "randy00"
Dec 14 11:58:41 pid:769 NOTICE: rsGeneralAdmin add child "pt001{}" to resource "randy01"
Dec 14 11:58:41 pid:777 NOTICE: rsGeneralAdmin add child "pt011{}" to resource "randy01"
Dec 14 11:58:41 pid:785 NOTICE: rsGeneralAdmin add child "pt100{}" to resource "randy10"
Dec 14 11:58:42 pid:793 NOTICE: rsGeneralAdmin add child "pt110{}" to resource "randy10"
Dec 14 11:58:42 pid:801 NOTICE: rsGeneralAdmin add child "pt101{}" to resource "randy11"
Dec 14 11:58:42 pid:809 NOTICE: rsGeneralAdmin add child "pt111{}" to resource "randy11"
Dec 14 11:58:42 pid:817 NOTICE: rsGeneralAdmin add child "pt200{}" to resource "randy20"
Dec 14 11:58:43 pid:825 NOTICE: rsGeneralAdmin add child "pt210{}" to resource "randy20"
Dec 14 11:58:43 pid:833 NOTICE: rsGeneralAdmin add child "pt201{}" to resource "randy21"
Dec 14 11:58:43 pid:841 NOTICE: rsGeneralAdmin add child "pt211{}" to resource "randy21"
Dec 14 11:58:44 pid:849 NOTICE: rsGeneralAdmin add child "randy01{}" to resource "replA"
Dec 14 11:58:44 pid:857 NOTICE: rsGeneralAdmin add child "randy10{}" to resource "replA"
Dec 14 11:58:44 pid:865 NOTICE: rsGeneralAdmin add child "randy00{}" to resource "replB"
Dec 14 11:58:44 pid:873 NOTICE: rsGeneralAdmin add child "randy21{}" to resource "replB"
Dec 14 11:58:45 pid:881 NOTICE: rsGeneralAdmin add child "randy11{}" to resource "replC"
Dec 14 11:58:45 pid:889 NOTICE: rsGeneralAdmin add child "randy20{}" to resource "replC"
Dec 14 11:58:45 pid:897 NOTICE: rsGeneralAdmin add child "replA{}" to resource "ptA"
Dec 14 11:58:46 pid:905 NOTICE: rsGeneralAdmin add child "replB{}" to resource "ptB"
Dec 14 11:58:46 pid:913 NOTICE: rsGeneralAdmin add child "replC{}" to resource "ptC"
Dec 14 11:58:46 pid:921 NOTICE: rsGeneralAdmin add child "ptA{}" to resource "rootRandy"
Dec 14 11:58:47 pid:929 NOTICE: rsGeneralAdmin add child "ptB{}" to resource "rootRandy"
Dec 14 11:58:47 pid:937 NOTICE: rsGeneralAdmin add child "ptC{}" to resource "rootRandy"
Dec 14 11:58:47 pid:945 NOTICE: rsGeneralAdmin add child "rootRandy{}" to resource "rootResc"

On a newly setup system, I have 2 users: irods and user

$ iadmin lu irods

user_id: 10003 user_name: irods user_type_name: rodsadmin

and

$ iadmin lu user 

user_id: 10016 user_name: user user_type_name: rodsuser

Logged in as a rods admin, I activate the metadata guard policy:

$ imeta set -C /rugZone irods::metadata_guard '{"prefixes": ["irods::"], "admin_only": true}'
$ imeta ls -C /rugZone

AVUs defined for collection /rugZone:
attribute: irods::metadata_guard
value: {"prefixes": ["irods::"], "admin_only": true}
units: 

I activate the logical quotas on the user home folder

 $ irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_start_monitoring_collection", "collection": "/rugZone/home/user"}' null ruleExecOut
 $ irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_set_maximum_number_of_data_objects", "collection": "/rugZone/home/user", "value": "15"}' null ruleExecOut
 $ imeta ls -C /rugZone/home/user

AVUs defined for collection /rugZone/home/user:
attribute: irods::logical_quotas::maximum_number_of_data_objects
value: 15
units: 
----
attribute: irods::logical_quotas::total_number_of_data_objects
value: 0
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 0
units: 

Logged in as user

$ iput packages.txt
$ ils -L
/rugZone/home/user:
  user              0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irods001          260 2021-12-15.09:49 & packages.txt
        generic    /mnt/irods001/home/user/packages.txt
  user              1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irods100          260 2021-12-15.09:49 & packages.txt
        generic    /mnt/irods100/home/user/packages.txt

When I check the logical quotas values:

$ imeta ls -C /rugZone/home/user
AVUs defined for collection /rugZone/home/user:
attribute: irods::logical_quotas::maximum_number_of_data_objects
value: 15
units: 
----
attribute: irods::logical_quotas::total_number_of_data_objects
value: 1
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 260
units: 

Logged in as irods user:

$ imeta ls -C /rugZone/home/user
AVUs defined for collection /rugZone/home/user:
attribute: irods::logical_quotas::maximum_number_of_data_objects
value: 15
units: 
----
attribute: irods::logical_quotas::total_number_of_data_objects
value: 1
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 260
units: 

Recalculate the quotas

$ irule -r irods_rule_engine_plugin-logical_quotas-instance '{"operation": "logical_quotas_recalculate_totals", "collection": "/rugZone/home/user"}' null ruleExecOut
$ imeta ls -C /rugZone/home/user
AVUs defined for collection /rugZone/home/user:
attribute: irods::logical_quotas::maximum_number_of_data_objects
value: 15
units: 
----
attribute: irods::logical_quotas::total_number_of_data_objects
value: 2
units: 
----
attribute: irods::logical_quotas::total_size_in_bytes
value: 520
units: 

Summary There is no issue of 'traffic' on the system. The containers are local and empty. At any given time there is just one user connected to the system.

I originally noticed that the iput action of irods user (rodsadmin) would increase the `irods::logical_quotas::total_number_of_data_objects' counter by 2, but that I could not reproduce anymore. What i still can reproduce It is just that recalculate counts every object twice ( including replicas), whereas the normal iput counts every object once.

On all irods containers, the rule engines are all configured in the same way:

       "rule_engines": [
            {
                "instance_name": "irods_rule_engine_plugin-logical_quotas-instance",
                "plugin_name": "irods_rule_engine_plugin-logical_quotas",
                "plugin_specific_configuration": {
                    "namespace": "irods::logical_quotas",
                    "metadata_attribute_names": {
                        "maximum_number_of_data_objects": "maximum_number_of_data_objects",
                        "maximum_size_in_bytes": "maximum_size_in_bytes",
                        "total_number_of_data_objects": "total_number_of_data_objects",
                        "total_size_in_bytes": "total_size_in_bytes"
                    }
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-metadata_guard-instance",
                "plugin_name": "irods_rule_engine_plugin-metadata_guard",
                "plugin_specific_configuration": {}
            },
            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance", 
                "plugin_name": "irods_rule_engine_plugin-irods_rule_language", 
                "plugin_specific_configuration": {
                    "re_data_variable_mapping_set": [
                        "core"
                    ], 
                    "re_function_name_mapping_set": [
                        "core"
                    ], 
                    "re_rulebase_set": [
                        "extractor",
                        "rugrdms",
                        "utils",
                        "core"
                    ], 
                    "regexes_for_supported_peps": [
                        "ac[^ ]*", 
                        "msi[^ ]*", 
                        "[^ ]*pep_[^ ]*_(pre|post|except|finally)"
                    ]
                }, 
                "shared_memory_instance": "irods_rule_language_rule_engine"
            }, 
            {
                "instance_name" : "irods_rule_engine_plugin-audit_amqp-instance",
                "plugin_name" : "irods_rule_engine_plugin-audit_amqp",
                "plugin_specific_configuration" : {
                ...
                }
            }, 
            {
                "instance_name" : "irods_rule_engine_plugin-python-instance",
                "plugin_name" : "irods_rule_engine_plugin-python",
                "plugin_specific_configuration" : {}
            }, 
            {
                "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance", 
                "plugin_name": "irods_rule_engine_plugin-cpp_default_policy", 
                "plugin_specific_configuration": {}
            }
        ]

Let me know if you need more information.

korydraughn commented 2 years ago

Thanks for this. I will look into this as soon as possible.

korydraughn commented 2 years ago

I was able to reproduce this issue against using 4.2.11 plugins.

We'll get a fix in asap!

simstoica commented 2 years ago

Cool! Thanks!