irods / irods_resource_plugin_s3

S3-compatible storage resource plugin for iRODS
Other
13 stars 16 forks source link

Can't trim an object from an S3 archive resource #1850

Open JustinKyleJames opened 5 years ago

JustinKyleJames commented 5 years ago

If I have a data object in an S3 Archive resource, this object can't be trimmed from this archive resource. This produces a HIERARCHY_ERROR.

This was tested from an S3 plugin build on the master branch using iRODS 4.2.5.

Note that this works fine using a unixfilesystem as the archive resource.

Setup

$ ilsresc
compResc:compound
├── archiveResc:s3
└── cacheResc:unixfilesystem

Test Steps

$ iput -R compResc f1
$ ils -L f1
  rods              0 compResc;cacheResc            7 2018-11-29.14:01 & f1
        generic    /var/lib/irods/s3cache/home/rods/f1
  rods              1 compResc;archiveResc            7 2018-11-29.14:01 & f1
        generic    /irods-bucket/irods/Vault/home/rods/f1
$ itrim -n 1 -N 1 f1
remote addresses: 127.0.0.1 ERROR: trimUtil: trim error for /tempZone/home/rods/f1.  status = -1803000 HIERARCHY_ERROR
Total size trimmed = 0.000 MB. Number of files trimmed = 0.
trel commented 2 years ago

Is this still true with 4.2.11?

Two cases to check: 1) as 'archive' under compound 2) 'cacheless'

scimerman commented 2 years ago

We upgraded from 4.2.11 to 4.3.0 few weeks ago. I cannot be sure when exactly the issue happened on our system, but I had detected it on our system few days ago. And it is the exact same problem. We have two resources, a local unixfilesystem and a remote s3. The problematic replica was on the remote s3 resource.

I could not execute any of the commands on it, the replica was simply marked as stale. I ran the checksum and the files were good. But any work on the replica was greeted with: -1803000 HIERARCHY_ERROR.

Initially I tried to trim, but later I got more desperate to resolve the issue. So I tried almost any command I could remember just to see if anything would change the remote replica, all errored. I even tried to forcefully delete the file with all the replicas (just for to reuploaded after that) - error.

In the end I managed to solve it by manually marked the replica as good, and then I could trimmed the problematic replicas: iadmin modrepl logical_path /somezone/home/problematicuser/somefile replica_number 1 DATA_REPL_STATUS 1

Unfortunately I cannot provide more information, as the logs were cleaned while debugging :flushed: What I can confirm, is that the s3 is indeed cacheless as @trel mentioned.

trel commented 2 years ago

So you had two replicas of the same data object, one on a local unixfilesystem, and the other in a cacheless s3 resource.

The unixfilesystem replica was marked good, and the s3 replica was marked stale.

And you were not able to trim the s3 replica?

Can you reproduce getting into that situation with the use of iadmin modrepl and set a good data object's s3 replica to stale? Then we might be able to duplicate what you're seeing.

JustinKyleJames commented 1 year ago

So you had two replicas of the same data object, one on a local unixfilesystem, and the other in a cacheless s3 resource.

The unixfilesystem replica was marked good, and the s3 replica was marked stale.

And you were not able to trim the s3 replica?

Can you reproduce getting into that situation with the use of iadmin modrepl and set a good data object's s3 replica to stale? Then we might be able to duplicate what you're seeing.

Just for clarification, if we are still talking about the compound/cache/archive setup, the S3 would not be able to be in a cacheless configuration because in that configuration we do not implement the stage-to-cache and sync-from-archive operations.

trel commented 1 year ago

We have two resources, a local unixfilesystem and a remote s3

Seems it was just these two? No compound involved?