irods / irods_capability_storage_tiering

BSD 3-Clause "New" or "Revised" License
5 stars 10 forks source link

Multiple igets of same data object results in multiple restage jobs #103

Closed alanking closed 4 years ago

alanking commented 5 years ago
$ irule -F example_tiering_invocation.r
$ # storage tiering rule is now running
$ iqstat
id     name
10068 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["example_group"]}

$ # data object already tiered out
$ ils -L foo
/tempZone/home/rods:
  rods             1 irodsArch          224 2019-06-17.16:25 & foo
        generic    /var/lib/irods/archVault/home/rods/foo

$ # iget data object twice
$ iget foo - && iget foo - && iqstat
{
    "catalog_schema_version": 6,
    "commit_id": "db085559893dabcb8e3acd30dbd7c4b65bc110dc",
    "configuration_schema_version": 3,
    "installation_time": "2019-06-17T14:21:12.621312",
    "irods_version": "4.2.5"
}{
    "catalog_schema_version": 6,
    "commit_id": "db085559893dabcb8e3acd30dbd7c4b65bc110dc",
    "configuration_schema_version": 3,
    "installation_time": "2019-06-17T14:21:12.621312",
    "irods_version": "4.2.5"
}id     name
10069 {"destination-resource":"irodsResc","object-path":"/tempZone/home/rods/foo","preserve-replicas":false,"rule-engine-operation":"migrate_object_to_resource","source-resource":"irodsArch","verification-type":"catalog"}
10068 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["example_group"]}
10070 {"destination-resource":"irodsResc","object-path":"/tempZone/home/rods/foo","preserve-replicas":false,"rule-engine-operation":"migrate_object_to_resource","source-resource":"irodsArch","verification-type":"catalog"}

$ # replication fails when restaging due to race condition
$ ils -L foo
/tempZone/home/rods:
  rods             2 irodsResc          224 2019-06-17.16:26 & foo
        generic    /var/lib/irods/rescVault/replica/home/rods/foo.554165321
  rods             3 irodsResc          224 2019-06-17.16:26 & foo
        generic    /var/lib/irods/rescVault/home/rods/foo

$ # failed delay rule remains in the queue
$ iqstat -l
<SNIP>
id: 10070
name: {"destination-resource":"irodsResc","object-path":"/tempZone/home/rods/foo","preserve-replicas":false,"rule-engine-operation":"migrate_object_to_resource","source-resource":"irodsArch","verification-type":"catalog"}
rei_file_path: /var/lib/irods/config/packedRei/rei.rods.1257362949
user_name: rods
address:
time: 1560792553 : 2019-06-17.17:29:13
frequency: 2h DOUBLE UNTIL SUCCESS OR 5 TIMES. ORIGINAL TIMES=6
priority:
estimated_exe_time:
notification_addr:
last_exe_time: 1560788953
exec_status:

Possible solutions:

jasoncoposky commented 5 years ago

https://github.com/irods/irods_capability_storage_tiering/blob/master/storage_tiering.cpp#L764

We can generate a job id, apply that as metadata before scheduling the migration, and include it in the json payload in the queued rule. One migration may repave another one but in the end only one metadata application will win at which point if the job id in the catalog does not match the job id in the payload that scheduled migration will simply exit.

trel commented 4 years ago

will be mitigated by the fix to #108