irods / irods_resource_plugin_s3

S3-compatible storage resource plugin for iRODS
Other
12 stars 16 forks source link

Changing regions doesn't work? #2137

Open luijs opened 11 months ago

luijs commented 11 months ago

All,

I just had a confusing chat with our infra department when they told us we were writing to the wrong region with our irods instance. It looks like when you update the region in the context string irods will still send the data to the old region. Some more detail on what happened and what I tested:

We run irods version 4.3.0 on ubuntu 20.04

When we started with the S3 plugin the infra team provided us with a region that we could use, say 'regionX'. We created an S3 resource with othercontextsettings;S3_REGIONNAME=regionX;othercontextsettings.

After a while they needed to do some update on the system, so they created regionY for us. We didn't really have any data on regionX, or we just deleted everything, I can't remember. I did a iadmin modresc s3_resc context "othercontextsettings;S3_REGIONNAME=regionY;othercontextsettings, and proceeded to work with the resource. However, a while later, infra told us: "You are still on regionX, you should move to regionY. They showed me their dashboard and indeed, I wasn't sending anything to regionY, it was going to regionX.

After this I did another test, I changed the regionname to a non existing region, and I was able to continue to write and read my files.

Now changing regions might not be a usecase that is often done, because in irods you would probably want to create a new resource and do some magic to move the data there. However, it did struck me as strange. Is there another place where the settings of the resoure are stored except for the context string? (In r_resc_main the change was done accordingly)? How can irods still know about regionX if I changed the context string and restarted irods?

korydraughn commented 11 months ago

That sounds like a bug. Can you share the full context string?

@JustinKyleJames Any thoughts?

JustinKyleJames commented 11 months ago

This seemed to work for me. I created a bucket in AWS named justinkylejames-eu-central-1. I then created a resource pointing to this bucket/region.

-bash-4.2$ iadmin lr amazons3resc
resc_id: 10019
resc_name: amazons3resc
zone_name: tempZone
resc_type_name: s3
resc_net: 6ce0709ac8c6
resc_def_path: /justinkylejames-eu-central-1/amazons3resc
free_space:
free_space_ts:
resc_info:
r_comment:
resc_status:
create_ts: 2023-09-22.16:39:17
modify_ts: 2023-09-22.18:04:06
resc_children:
resc_context: S3_DEFAULT_HOSTNAME=s3.eu-central-1.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=eu-central-1;S3_RETRY_COUNT=2;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;HOST_MODE=cacheless_attached;S3_ENABLE_MD5=1;S3_SIGNATURE_VERSION=2;S3_ENABLE_MPU=1;S3_MPU_THREADS=30;S3_MPU_CHUNK=256
resc_parent:
resc_parent_context:

I then did a iput and iget to make sure I could write to that bucket.

-bash-4.2$ iput -R amazons3resc VERSION.json   
-bash-4.2$ iget VERSION.json - 
{
    "catalog_schema_version": 8, 
    "commit_id": "2ed549ca7fe455aaa7755becc6c14b233dcbc0b4", 
    "configuration_schema_version": 3, 
    "installation_time": "2023-07-13T15:46:45.162355", 
    "irods_version": "4.2.12"
}

I then changed the resource to us-east-1.

-bash-4.2$ iadmin lr amazons3resc
resc_id: 10019
resc_name: amazons3resc
zone_name: tempZone
resc_type_name: s3
resc_net: 6ce0709ac8c6
resc_def_path: /justinkylejames-eu-central-1/amazons3resc
free_space: 
free_space_ts: 
resc_info: 
r_comment: 
resc_status: 
create_ts: 2023-09-22.16:39:17
modify_ts: 2023-09-22.18:07:00
resc_children: 
resc_context: S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=2;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;HOST_MODE=cacheless_attached;S3_ENABLE_MD5=1;S3_SIGNATURE_VERSION=2;S3_ENABLE_MPU=1;S3_MPU_THREADS=30;S3_MPU_CHUNK=256
resc_parent: 
resc_parent_context: 

I tried a get and it failed:

-bash-4.2$ iget VERSION.json - 
remote addresses: 10.15.0.6 ERROR: getUtil: get error for - status = -718000 S3_FILE_OPEN_ERR
Level 0: [-]    /irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:117:irods::error irods::experimental::io::s3_transport::handle_glacier_status(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, irods::experimental::io::s3_transport::object_s3_status, const std::string &) :  status [S3_FILE_OPEN_ERR]  errno [] -- message [Object does not exist and open mode requires it to exist.]

I then updated the resource to use a bucket in us-east-1 (justinkylejames1):

-bash-4.2$ iadmin lr amazons3resc
resc_id: 10019
resc_name: amazons3resc
zone_name: tempZone
resc_type_name: s3
resc_net: 6ce0709ac8c6
resc_def_path: /justinkylejames1/amazons3resc
free_space:
free_space_ts:
resc_info:
r_comment:
resc_status:
create_ts: 2023-09-22.16:39:17
modify_ts: 2023-09-22.18:12:53
resc_children:
resc_context: S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=2;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;HOST_MODE=cacheless_attached;S3_ENABLE_MD5=1;S3_SIGNATURE_VERSION=2;S3_ENABLE_MPU=1;S3_MPU_THREADS=30;S3_MPU_CHUNK=256
resc_parent:
resc_parent_context:

I then did a put and get and it worked showing that it was going to the new region.

-bash-4.2$ iput -R amazons3resc VERSION.json   
-bash-4.2$ iget VERSION.json - 
{
    "catalog_schema_version": 8, 
    "commit_id": "2ed549ca7fe455aaa7755becc6c14b233dcbc0b4", 
    "configuration_schema_version": 3, 
    "installation_time": "2023-07-13T15:46:45.162355", 
    "irods_version": "4.2.12"
}

Is it possible that your server is requiring virtual host style addressing? See the following in the README:

S3_URI_REQUEST_STYLE - The path request style used. This is either "path" or "virtualhost". The default is "path". See path vs virtual hosted requests.

luijs commented 11 months ago

Hey, the string I have been using: S3_DEFAULT_HOSTNAME=some.localhosted.s3;S3_AUTH_FILE=/some/dir/file.s3.keypair;S3_REGIONNAME=nl-region;S3_RETRY_COUNT=2;S3_WAIT_TIME_SECONDS=3;S3_PROTO=HTTPS;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_detached;S3_CACHE_DIR=/some/dir;S3_MPU_CHUNK=500

As you can see I didn't set S3_URI_REQUEST_STYLE explicitly. With 'your server', do you mean my irods server, or the S3 host?

JustinKyleJames commented 11 months ago

Hey, the string I have been using: S3_DEFAULT_HOSTNAME=some.localhosted.s3;S3_AUTH_FILE=/some/dir/file.s3.keypair;S3_REGIONNAME=nl-region;S3_RETRY_COUNT=2;S3_WAIT_TIME_SECONDS=3;S3_PROTO=HTTPS;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_detached;S3_CACHE_DIR=/some/dir;S3_MPU_CHUNK=500

As you can see I didn't set S3_URI_REQUEST_STYLE explicitly. With 'your server', do you mean my irods server, or the S3 host?

I was referring to the server for the S3 provider itself.

Is it possible that your provider requires the region name in the S3_DEFAULT_HOSTNAME? I know AWS requires that in regions other than us-east-1. Note how I had that set to s3.eu-central-1.amazonaws.com above when setting the region to eu-central-1.

Also try setting S3_URI_REQUEST_STYLE=virtualhost in the context string.

If neither of those work then I would need to see some trace logging from the S3 server. If you can't get it then I could build you a libs3 which enables request/response logging which would end up going to /var/log/irods/irods.log.

JustinKyleJames commented 10 months ago

Is this still an issue?