Open sabinehelm opened 9 months ago
The check is intended to verify in-region execution but shouldn't be limited to EC2, I think your change would be a valid PR if it works the same way in EC2!
Thanks @betolink. This is great to hear! I did not use exactly the same code snippet as given in #231. But we used the following to get the current region, which should also work from EC2:
my_session = boto3.session.Session()
my_region = my_session.region_name
@sabinehelm Thanks for sharing your updated solution. We worked on this a bunch today and decided to try and use botocore directly:
botocore.session.get_session().get_config_variable("region")
Can you confirm whether or not this will work for your use case?
I don't think boto3/botocore is going to do what you want -- namely, determine which region you're actually running in.
Boto is going to pull the session information from your AWS config or AWS_*
environment variables, so it's more checking what region you're configured to access (what APIs to hit) than what region you're actually running in.
For example, on my laptop:
>>> import botocore.session
>>> botocore.session.get_session().get_config_variable("region")
'us-west-2'
because I have my default region in my ~/.aws/config
set up like so:
[default]
region = us-west-2
Likewise, if I instead do:
>>> import os
>>> import botocore.session
>>> os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
>>> botocore.session.get_session().get_config_variable("region")
'us-east-1'
AFAIK, from an EC2 instance, the only way to determine what region the AWS instance is running in is to hit this special IP address:
http://169.254.169.254/latest/meta-data/placement/region
And you'll need to handle IMDSv1 or IMDSv2 metadata rquests -- see this answer on SO: https://stackoverflow.com/a/77902397
This should work on ECS configured for EC2, but I don't know if it works on ECS configured for Fargate (I suspect it will though).
It looks like on Fargate, the AWS_REGION
environment variable is set, so the botocore method should work there, but it's not robust in that it's just checking environment variables that are mutable and used to primarily select which APIs to interact with, not to determine what region you're running in.
On ECS, here's a summary of available metadata: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html
So, determining this is non-trivial and requires knowing a bit about the service you're running in.
@sabinehelm Thanks for sharing your updated solution. We worked on this a bunch today and decided to try and use botocore directly:
botocore.session.get_session().get_config_variable("region")
Can you confirm whether or not this will work for your use case?
@JessicaS11: Thanks for your response. I tested the code snippet. It would work for our use case. But I fear it is already outdated after the last comments.
Thanks @sabinehelm, and agreed!
@jhkennedy, can you comment on what you'd recommend for #231 and #424? If I understand what you're saying correctly, our current implementation in #424 would not properly keep the user from running a notebook out of region (or tell us if someone really is working out of region), depending on how their config parameters are set.
If I understand what you're saying correctly, our current implementation in https://github.com/nsidc/earthaccess/pull/424 would not properly keep the user from running a notebook out of region (or tell us if someone really is working out of region), depending on how their config parameters are set.
Yes, correct. The current implementation will determine which AWS API you're configured to hit. That could be something the user manually configured or the one set by a service, but since it's all via config files or environment variables, there's no good way of knowing which.
@jhkennedy, can you comment on what you'd recommend for https://github.com/nsidc/earthaccess/issues/231 and https://github.com/nsidc/earthaccess/pull/424? I
@JessicaS11 I'm not sure -- this is hard.
So, I'll start by saying the answer to:
Was the spirit to check specifically for us-west-2, or to enable the user to see what region they are running in?
It should probably be yes to both, or at least confirm what region they are in.
So, overall, I think I'd recommend:
earthaccess.aws
module with:
get_ec2_region
, which has most of the current implementation inside _running_in_us_west_2
https://github.com/nsidc/earthaccess/blob/main/earthaccess/store.py#L146-L155get_fargate_region
, which would need to do the same, but for fargate as detailed here:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-metadata.htmlget_ecs_container_region
as detailed here:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-metadata.htmlget_config_region
which would use boto3/botocore as in #424get_region
(get_running_region
?) method that'd go through the above 4 in order and return the first one that didn't raise (or return None depending on implementation). If it hit 4, I'd probably also throw a warning like "Region inferred from AWS config or environment variable and may not represent the region you're running in." Then you could also have a convenience method like
def earthaccess.aws.ensure_in_region(region: str = 'us-west-2') -> bool:
on top of it and use it in the Earthaccess Store.
Note: Actual method/function names could be improved
All that said, this seems like something that should already exist so it'd be worth spending some time searching GitHub/PyPI... I don't see anything though, oddly.
For ec2, it might be worth just using this package: https://github.com/adamchainz/ec2-metadata
It looks well maintained, is sponsored, and is a "critical" project on PyPI.
I still don't see anything for ECS and Fargate, however.
I like what you're proposing @jhkennedy, and one would think that this should already be on a package that would work in all the execution environments in AWS, EC2, ECS, Lambda, etc
ec2-metadata will not work on any z2jh instances by default, as access to the metadata server is explicitly blocked by default (https://z2jh.jupyter.org/en/stable/administrator/security.html#block-cloud-metadata-api-with-a-privileged-initcontainer-running-iptables).
If I test the current approach to verify in-region execution it works from Openscapes (2i2c) is this the same endpoint? @yuvipanda
http://169.254.169.254/latest/meta-data/placement/region
@betolink yes, because we've intentionally unblocked that access point in the openscapes hub :) But we coupled it with appropriate IRSA roles so it is secure. At least when I last looked, just unblocking access to the metadata server without setting up IRSA or similar was pretty insecure.
Ah, this reminded me of an issue the VEDA hub reported, when using earthaccess the library didn't detect in-region execution and used the HTTP links, this is probably why this is happening. cc @abarciauskas-bgse
yeah, i'd suggest (similar to @jhkennedy elsewhere) to look at possibly looking for the redirects coming to figure out whatever you need to do internally, as ultimately everything else is going to be only a heuristic. For example we're going to do https://github.com/2i2c-org/infrastructure/issues/3273 soon for the openscapes hub, not sure what effect that will have on ec2-metadata.
@betolink sorry for the delay here but I verified that an updated earthaccess (v0.8.2) does open via S3 direct access on VEDA, whereas the current version installed on VEDA's Hub (v0.5.2) does not properly register that the instance is in-region. Hopefully this will be resolved once https://github.com/NASA-IMPACT/veda-jh-environments/issues/41 is completed.
Overheard from maintainers of oss.smce.nasa.gov:
yes, we do indeed block the instance metadata (for security reasons), so the check that earthaccess is making is not ideal for what they’re trying to do.
I am using AWS pcluster instance and I need to add earthaccess.__store__.in_region = True
to my script to enable the s3 earthdata.download() transfer while already located in the US-West-2 region.
In pull request #424 API access to
store._running_in_us_west_2()
is added in the form of a printed statement that the user is (or is not) running in AWS region us-west-2. Unfortunately the check instore._running_in_us_west_2()
only works for EC2 instances. It is not working for example for ECS (Elastic Container Service) instances running in region us-west-2. Now the big question is: is it intended that only EC2 instances in region us-west-2 can access the data OR can it be any computing instance running in region us-west-2.In the second case store._running_in_us_west_2() could be adapted using boto3 for checking the region following the code snippet in issues #231 :