irods / irods_capability_storage_tiering

BSD 3-Clause "New" or "Revised" License
5 stars 10 forks source link

Removal of object in second tier even if irods::storage_tiering::preserve_replicas is set to true #233

Closed cookie33 closed 9 months ago

cookie33 commented 10 months ago

BUG

VERSIONS

iRODS 4.3.1

Expected BEHAVIOUR

A system with tier 0 and tier 1. The parameter irods::storage_tiering::preserve_replicas is set to true for all tiers. Get a file which is in tier 1 only to tier 0. No data is deleted from tier 1.

Observed behaviour

The action to get a file from tier 1 causes following to happen if the file is not in tier 0.

Replication steps

We have two resources:

rods$ ilsresc -l eudatPnfs resource name: eudatPnfs id: 10003 zone: igor type: unixfilesystem location: irodstest2.storage.surfsara.nl vault: /data/eudatPnfs free space: free space time: : Never status: info: comment: create time: 01541156260: 2018-11-02.11:57:40 modify time: 01688043710: 2023-06-29.15:01:50 context: parent: parent context:

The meta data of the resources are as follows:

rods$ imeta ls -R eudatCache AVUs defined for resource eudatCache: attribute: irods::storage_tiering::group value: eudat units: 0

attribute: irods::storage_tiering::preserve_replicas value: true units:

attribute: irods::storage_tiering::time value: 120 units:

attribute: irods::storage_tiering::verification value: filesystem units:

rods$ imeta ls -R eudatPnfs AVUs defined for resource eudatPnfs: attribute: irods::storage_tiering::group value: eudat units: 1

attribute: irods::storage_tiering::preserve_replicas value: true units:

attribute: irods::storage_tiering::verification value: filesystem units:


we put a file in there. The file is present in tier 1 resource:

rods$ ils -l /igor/home/rods: rods 3 eudatPnfs 1931831 2023-12-08.11:02 & test_20231208_60.txt

rods$ imeta ls -d test_20231208_60.txt AVUs defined for dataObj /igor/home/rods/test_20231208_60.txt: attribute: irods::access_time value: 1702029757 units:

attribute: irods::storage_tiering::group value: eudat units: 3


We now try to retrieve a file from the tiered storage

rods$ date ; iget test_20231208_60.txt /tmp/test_retrieve_20231211.txt -f ; ls -l /tmp/test_retrieve_20231211.txt ; ils -l test_20231208_60.txt Mon Dec 11 11:29:13 CET 2023 -rw-r----- 1 rods rods 1931831 Dec 11 11:29 /tmp/test_retrieve_20231211.txt rods 3 eudatPnfs 1931831 2023-12-08.11:02 & test_20231208_60.txt

rods$ ils -l test_20231208_60.txt rods 3 eudatPnfs 1931831 2023-12-08.11:02 & test_20231208_60.txt

rods$ ils -l test_20231208_60.txt rods 4 eudatCache 1931831 2023-12-11.11:29 & test_20231208_60.txt

We see that the object ends up in the tier 0 resource. And it is removed from the tier 1 resource.

And finally it is copied again to the tier 1 resource:

rods$ ils -l test_20231208_60.txt rods 4 eudatCache 1931831 2023-12-11.11:29 & test_20231208_60.txt rods 5 eudatPnfs 1931831 2023-12-11.11:33 & test_20231208_60.txt


In the irods logfile it shows:

{"log_category":"legacy","log_level":"info","log_message":"irods::storage_tiering :: delay params for [eudatPnfs] - [irods_rule_engine_plugin-unified_storage_tiering-instance1h DOUBLE UNTIL SUCCESS OR 6 TIMES13s]","request_api_name":"DATA_OBJ_GET_AN","request_api_number":608,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24138,"server_timestamp":"2023-12-11T10:29:14.398Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"irods::storage_tiering migrating [/igor/home/rods/test_20231208_60.txt] from [eudatPnfs] to [eudatCache]","request_api_name":"DATA_OBJ_GET_AN","request_api_number":608,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24138,"server_timestamp":"2023-12-11T10:29:14.414Z","server_type":"agent","server_zone":"igor"} ... {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - [filesystem] [/igor/home/rods/test_20231208_60.txt] [eudatPnfs] [eudatCache]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24296,"server_timestamp":"2023-12-11T10:29:57.894Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - source attributes: [/data/eudatPnfs/home/rods/test_20231208_60.txt] [1931831] [eudatPnfs] []","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24296,"server_timestamp":"2023-12-11T10:29:57.897Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - destination attributes: [/data/eudatCache/Vault/home/rods/test_20231208_60.txt] [1931831] [eudatCache] []","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24296,"server_timestamp":"2023-12-11T10:29:57.899Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - verify filesystem: 1 - 1931831 vs 1931831","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24296,"server_timestamp":"2023-12-11T10:29:57.900Z","server_type":"agent","server_zone":"igor"} ... {"log_category":"legacy","log_level":"info","log_message":"found 5 objects for resc [eudatCache] with query [SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < '1702290630' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::migration_scheduled' AND DATA_RESC_ID IN ('10002',)] type [0]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24753,"server_timestamp":"2023-12-11T10:32:30.614Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"irods::storage_tiering :: delay params for [eudatCache] - [irods_rule_engine_plugin-unified_storage_tiering-instance1h DOUBLE UNTIL SUCCESS OR 6 TIMES14s]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24753,"server_timestamp":"2023-12-11T10:32:30.876Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"irods::storage_tiering migrating [/igor/home/rods/test_20231208_60.txt] from [eudatCache] to [eudatPnfs]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24753,"server_timestamp":"2023-12-11T10:32:30.888Z","server_type":"agent","server_zone":"igor"} ... {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - [filesystem] [/igor/home/rods/test_20231208_60.txt] [eudatCache] [eudatPnfs]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24853,"server_timestamp":"2023-12-11T10:33:01.722Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - source attributes: [/data/eudatCache/Vault/home/rods/test_20231208_60.txt] [1931831] [eudatCache] []","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24853,"server_timestamp":"2023-12-11T10:33:01.725Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - destination attributes: [/data/eudatPnfs/home/rods/test_20231208_60.txt] [1931831] [eudatPnfs] []","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24853,"server_timestamp":"2023-12-11T10:33:01.727Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"verify_replica_for_destination_resource - verify filesystem: 1 - 1931831 vs 1931831","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":24853,"server_timestamp":"2023-12-11T10:33:01.728Z","server_type":"agent","server_zone":"igor"} ... {"log_category":"legacy","log_level":"info","log_message":"found 5 objects for resc [eudatCache] with query [SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < '1702290814' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::migration_scheduled' AND DATA_RESC_ID IN ('10002',)] type [0]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":25326,"server_timestamp":"2023-12-11T10:35:34.105Z","server_type":"agent","server_zone":"igor"} {"log_category":"legacy","log_level":"info","log_message":"irods::storage_tiering - skipping migration for [/igor/home/rods/test_20231208_60.txt] in resource list ['10003',]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"145.100.3.239","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"irodstest2.storage.surfsara.nl","server_pid":25326,"server_timestamp":"2023-12-11T10:35:34.358Z","server_type":"agent","server_zone":"igor"}



What are we doing wrong?
Or is not supposed to work that way? 
cookie33 commented 10 months ago

I think it has to do with the implementation. For function migrate_object_to_minimum_restage_tier the parameter is always set to false:

    void storage_tiering::migrate_object_to_minimum_restage_tier(
        const std::string& _object_path,
        const std::string& _user_name,
        const std::string& _user_zone,
        const std::string& _source_resource) {

        try {
            const auto source_replica_number = get_replica_number_for_resource(
                                                   comm_,
                                                   _object_path,
                                                   _source_resource);
            const auto group_name = get_group_name_by_replica_number(
                                        comm_,
                                        config_.group_attribute,
                                        _object_path,
                                        source_replica_number);
            const auto low_tier_resource_name = get_restage_tier_resource_name(
                                                    comm_,
                                                    group_name);
            // do not queue movement if data is on minimum tier
            // TODO:: query for already queued movement?
            if(low_tier_resource_name == _source_resource) {
                return;
            }

            queue_data_movement(
                comm_,
                config_.instance_name,
                group_name,
                _object_path,
                _user_name,
                _user_zone,
                source_replica_number,
                _source_resource,
                low_tier_resource_name,
                get_verification_for_resc(comm_, low_tier_resource_name),
                false,
                get_data_movement_parameters_for_resource(comm_, _source_resource));
        }

if this parameter is checked/set in this function it might work.

cookie33 commented 10 months ago

Make change in code:

$ git diff
diff --git a/storage_tiering.cpp b/storage_tiering.cpp
index fa2872e..75fc81a 100644
--- a/storage_tiering.cpp
+++ b/storage_tiering.cpp
@@ -817,7 +817,7 @@ namespace irods {
                 _source_resource,
                 low_tier_resource_name,
                 get_verification_for_resc(comm_, low_tier_resource_name),
-                false,
+                get_preserve_replicas_for_resc(comm_, low_tier_resource_name),
                 get_data_movement_parameters_for_resource(comm_, _source_resource));
         }
         catch(const exception& _e) {

test:

30920 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["eudat"]} 31069 {"delay_conditions":"irods_rule_engine_plugin-unified_storage_tiering-instance1h DOUBLE UNTIL SUCCESS OR 6 TIMES8s","destination-resource":"eudatCache","group-name":"eudat","md5":"5346dcd120ef038ab74a24664dd115f7","object-path":"/igor/home/rods/test_20231208_60.txt","preserve-replicas":true,"rule-engine-instance-name":"irods_rule_engine_plugin-unified_storage_tiering-instance","rule-engine-operation":"irods_policy_data_movement","source-replica-number":"7","source-resource":"eudatPnfs","user-name":"rods","user-zone":"igor","verification-type":"filesystem"}

* There is still a copy on the tier 1

rods$ ils -l /igor/home/rods: rods 7 eudatPnfs 1931831 2023-12-11.13:44 & test_20231208_60.txt

rods$ imeta ls -d test_20231208_60.txt AVUs defined for dataObj /igor/home/rods/test_20231208_60.txt: attribute: irods::access_time value: 1702298856 units:

attribute: irods::storage_tiering::group value: eudat units: 8

* and we still have a copy on the tier1 and also a copy on the tier 0. NOTE the replica numbers!

rods$ ils -l /igor/home/rods: rods 7 eudatPnfs 1931831 2023-12-11.13:44 & test_20231208_60.txt rods 8 eudatCache 1931831 2023-12-11.13:47 & test_20231208_60.txt



This seems to do what I want. Keep a object on the tier 1 resource when replicating it to tier 0.

Are there other things to consider?
trel commented 10 months ago

awesome. initial review seems sane.

we'll need / prepare a PR with this and a test that fails prior to the edit, and passes after.

cookie33 commented 10 months ago

awesome. initial review seems sane.

we'll need / prepare a PR with this and a test that fails prior to the edit, and passes after.

It is not a fail as such. It is just unexpected behaviour. (for me) So now it adheres to the preserve.replicas if it is set on a second or third tier when restaging objects.

trel commented 10 months ago

understood. i mean a test that asserts that the replica is there based on the setting, and that assertion would fail (as defined by you as unexpected). thanks for finding this.

alanking commented 9 months ago

Looking at this again, this appears to be a regression. Restaging and preserve_replicas was brought up in #125. But... our tests have been passing all along: https://github.com/irods/irods_capability_storage_tiering/blob/b50254bad08d8f24b63e91dce01c5c61f455a0ca/packaging/test_plugin_unified_storage_tiering.py#L345-L376