irods / irods_capability_indexing

BSD 3-Clause "New" or "Revised" License
1 stars 11 forks source link

`job_limit_per_collection_indexing_operation` not being honoured #118

Open kript opened 2 years ago

kript commented 2 years ago

Installed irods-rule-engine-plugin-indexing version 4.2.7.1

Set the limit to 500

$ grep job_limit_per_collection_indexing_operation /etc/irods/server_config.json
                "job_limit_per_collection_indexing_operation" : "500"

Add a collection which has well over that in items of metadata;

irods@irods-seq-indexing:~$ imeta -z seq add -C /seq/home irods::indexing::index irods-seq::metadata elasticsearch

Observe the delay server queue jump to 4.7k.

irods@irods-seq-indexing:~$ date; iqstat -a | wc -l; ps fauxww | grep -c irodsServer ; netstat -t | grep dbsrv6 | wc -l
Wed 18 May 14:21:56 BST 2022
3
6
6
irods@irods-seq-indexing:~$ date; iqstat -a | wc -l; ps fauxww | grep -c irodsServer ; netstat -t | grep dbsrv6 | wc -l
Wed 18 May 14:22:09 BST 2022
3
5
6
irods@irods-seq-indexing:~$ date; iqstat -a | wc -l; ps fauxww | grep -c irodsServer ; netstat -t | grep dbsrv6 | wc -l
Wed 18 May 14:22:32 BST 2022
495
7
8
irods@irods-seq-indexing:~$ date; iqstat -a | wc -l; ps fauxww | grep -c irodsServer ; netstat -t | grep dbsrv6 | wc -l
Wed 18 May 14:23:06 BST 2022
4784
10
11
irods@irods-seq-indexing:~$ date; iqstat -a | wc -l; ps fauxww | grep -c irodsServer ; netstat -t | grep dbsrv6 | wc -l
Wed 18 May 14:25:41 BST 2022
2
5
16

As I understood it it should have no more than 500 rules in the queue at a time? Or have I misunderstood?

trel commented 2 years ago

Here's the current logic... @d-w-moore maybe we need a paragraph in the README.md explaining with an example...

https://github.com/irods/irods_capability_indexing/blob/c26b87b661e6554eab6649877290a112e606e771/indexing_utilities.cpp#L326-L347

d-w-moore commented 2 years ago

@kript is it possible there is more than one indexing plugin (as distinguished by the instance name) active in the zone in question?

kript commented 2 years ago

@d-w-moore that's a horrifying thought... but I don't think so!

    "rule_engines": [
      {
        "instance_name": "irods_rule_engine_plugin-indexing-instance",
        "plugin_name": "irods_rule_engine_plugin-indexing",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-elasticsearch-instance",
        "plugin_name": "irods_rule_engine_plugin-elasticsearch",
        "plugin_specific_configuration": {
          "hosts": [
            "http://user:pass@elasticsearch:19200/"
          ],
          "bulk_count": 100,
          "read_size": 4194304,
          "job_limit_per_collection_indexing_operation": "500"
        }
      },
      {
        "instance_name": "irods_rule_engine_plugin-document_type-instance",
        "plugin_name": "irods_rule_engine_plugin-document_type",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-storage_tiering-instance",
        "plugin_name": "irods_rule_engine_plugin-storage_tiering",
        "plugin_specific_configuration": {
          "access_time_attribute": "irods::access_time",
          "group_attribute": "irods::storage_tiering::group",
          "time_attribute": "irods::storage_tiering::time",
          "query_attribute": "irods::storage_tiering::query",
          "verification_attribute": "irods::storage_tiering::verification",
          "data_movement_parameters_attribute": "irods::storage_tiering::restage_delay",
          "minimum_restage_tier": "irods::storage_tiering::minimum_restage_tier",
          "preserve_replicas": "irods::storage_tiering::preserve_replicas",
          "object_limit": "irods::storage_tiering::object_limit",
          "default_data_movement_parameters": "<EF>60s DOUBLE UNTIL SUCCESS OR 5 TIMES</EF>",
          "minumum_delay_time": "irods::storage_tiering::minimum_delay_time_in_seconds",
          "maximum_delay_time": "irods::storage_tiering::maximum_delay_time_in_seconds",
          "time_check_string": "TIME_CHECK_STRING",
          "data_transfer_log_level": "LOG_NOTICE"
        }
      },
      {
        "instance_name": "irods_rule_engine_plugin-apply_access_time-instance",
        "plugin_name": "irods_rule_engine_plugin-apply_access_time",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-data_verification-instance",
        "plugin_name": "irods_rule_engine_plugin-data_verification",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-data_replication-instance",
        "plugin_name": "irods_rule_engine_plugin-data_replication",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-data_movement-instance",
        "plugin_name": "irods_rule_engine_plugin-data_movement",
        "plugin_specific_configuration": {}
      },
      {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {
          "re_data_variable_mapping_set": [
            "core"
          ],
          "re_function_name_mapping_set": [
            "core"
          ],
          "re_rulebase_set": [
            "seq",
            "core"
          ],
          "regexes_for_supported_peps": [
            "ac[^ ]*",
            "msi[^ ]*",
            "[^ ]*pep_[^ ]*_(pre|post)"
          ]
        },
        "shared_memory_instance": "upgraded_irods_rule_language_rule_engine"
      },
      {
        "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
        "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
        "plugin_specific_configuration": {}
      }
    ]
  },
  "rule_engine_namespaces": [
    "",
    "indexing_"
  ],
kript commented 2 years ago

That is the server config on the provider we have designated as a delay server. I've just chcked the other two and they have;

    "rule_engines": [
                {
                 "instance_name": "irods_rule_engine_plugin-storage_tiering-instance",
                 "plugin_name": "irods_rule_engine_plugin-storage_tiering",
                 "plugin_specific_configuration": {
                    "access_time_attribute" : "irods::access_time",
                    "group_attribute" : "irods::storage_tiering::group",
                    "time_attribute" : "irods::storage_tiering::time",
                    "query_attribute" : "irods::storage_tiering::query",
                    "verification_attribute" : "irods::storage_tiering::verification",
                    "data_movement_parameters_attribute" : "irods::storage_tiering::restage_delay",
                    "minimum_restage_tier" : "irods::storage_tiering::minimum_restage_tier",
                    "preserve_replicas" : "irods::storage_tiering::preserve_replicas",
                    "object_limit" : "irods::storage_tiering::object_limit",
                    "default_data_movement_parameters" : "<EF>60s DOUBLE UNTIL SUCCESS OR 5 TIMES</EF>",
                    "minumum_delay_time" : "irods::storage_tiering::minimum_delay_time_in_seconds",
                    "maximum_delay_time" : "irods::storage_tiering::maximum_delay_time_in_seconds",
                    "time_check_string" : "TIME_CHECK_STRING",
                    "data_transfer_log_level" : "LOG_NOTICE"
                        }
                },
        {
                "instance_name": "irods_rule_engine_plugin-apply_access_time-instance",
                "plugin_name": "irods_rule_engine_plugin-apply_access_time",
                "plugin_specific_configuration": {
                }
        },
        {
                "instance_name": "irods_rule_engine_plugin-data_verification-instance",
                "plugin_name": "irods_rule_engine_plugin-data_verification",
                "plugin_specific_configuration": {
                }
        },
        {
                "instance_name": "irods_rule_engine_plugin-data_replication-instance",
                "plugin_name": "irods_rule_engine_plugin-data_replication",
                "plugin_specific_configuration": {
                }
        },
            {
                "instance_name": "irods_rule_engine_plugin-data_movement-instance",
                "plugin_name": "irods_rule_engine_plugin-data_movement",
                "plugin_specific_configuration": {
                }
        },
      {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {
          "re_data_variable_mapping_set": [
            "core"
          ],
          "re_function_name_mapping_set": [
            "core"
          ],
          "re_rulebase_set": [
            "seq",
            "core"
          ],
          "regexes_for_supported_peps": [
            "ac[^ ]*",
            "msi[^ ]*",
            "[^ ]*pep_[^ ]*_(pre|post)"
          ]
        },
        "shared_memory_instance": "upgraded_irods_rule_language_rule_engine"
      },
        {
            "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
            "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
            "plugin_specific_configuration": {
            }
      }
    ]
  },
  "rule_engine_namespaces": [
    ""
  ],
d-w-moore commented 2 years ago

@kript thanks for the forensic evidence : ) I'll check this out today on my end.

d-w-moore commented 2 years ago

@kript - I gave it a couple of runs at the throttle limit of 500 today, and used the iquest/grep commands you posted above, but never saw the number of DB connections go above 501, nor did the # of jobs exceed 501 as determined by the command

iquest --no-page '%s' "select RULE_EXEC_NAME where RULE_EXEC_NAME like '%/TESTCOL/%' " | grep -E 'job-category-tag":"[0-9]+-[0-9]+' | wc -l 

(Where TESTCOL is the name of the AVU-annotated top level collection). Btw the like-clause and the job-category-tag grep in the pipeline is a good formula for making sure the jobs you're including into your count do, indeed, belong to the indexing plugin.)

@korydraughn and I also considered the possibility - after looking into the 4.2.7 irodsReServer source code - that under some conditions, the connection pools to the DB might build up in memory, especially with more delayed task requests coming in than the number of threads on the provider can deal with at a time. That seems a likely possibility, if the issue you've recorded here is something you've dependably reproduced(ie, more than once with similar results).

d-w-moore commented 2 years ago

@kript Let me know if you'd like to set up a call to look into it further. I'm pretty flexible this week.