Closed alanking closed 7 months ago
Another consideration is what happens to the replica on ufs1 if/when the pseudo-tier-out occurs. For the default time-based policy, the access time metadata would need to be updated regardless of whether a replication occurs so that tiering out from ufs1 could happen appropriately. Otherwise, the data on ufs1 would immediately be in violation (since it hasn't been touched since it was last tiered out to ufs2) and it could potentially attempt to tier it out from ufs1 prematurely.
Upside to Option 1, no error handling, just ask for whether replication is necessary, then (as per your follow up comment) update its access time, then trim the source. Downside, a minimal TOCTOU window to consider/weigh for consequences.
Upside to Option 2, no TOCTOU. Downside, using errors as 'normal flow'... and potentially masking real errors when they occur.
Upside to Option 3, perhaps the cleanest... Downside, seemingly significant more work before we know it's worth it and lots of other things could depend on changes/additions we might make.
Voting for Option 1. Willing to be swayed.
If a write occurs and there are preserved replicas on ufs1 and ufs2, those replicas would become stale and the access time is updated, correct?
If a write occurs and there are preserved replicas on ufs1 and ufs2, those replicas would become stale and the access time is updated, correct?
I think that is true.
After some discussion, I think we are in agreement that Option 1 is the way to go.
Bug Report
iRODS Version, OS and Version
iRODS server: main Storage tiering plugin: main OS: ubuntu 22.04
What did you try to do?
I discovered this while writing an automated test for #234. The test suite was
test_plugin_unified_storage_tiering.TestStorageTieringPluginPreserveReplica
. This test had the following setup:It then does this:
Expected behavior
I expected good replicas on ufs1 and ufs2 and no replica on ufs0.
Observed behavior (including steps to reproduce, if applicable)
An error occurs and the replica remains on ufs0. Here are the messages from the log (filtered through
jq
and formatted):This is a direct result of the new rules for replication established in 4.2.9. A good replica is not allowed to be overwritten through replication: https://docs.irods.org/4.3.1/system_overview/data_objects/#replicate The replication API returns
SYS_NOT_ALLOWED
in this case, and the trim does not occur as a result.I see a few ways to address this which are by no means the only ways or even the best way:
SYS_NOT_ALLOWED
and take some action to determine whether tiering out or trimming should occur.