Closed valpesendorfer closed 4 years ago
Val, I think this was fixed in #357 So try 1.8.1. Thanks.
thanks @SpacemanPaul - I've pulled the most recent master and re-built the image, but it still shows the same version. Where do I get 1.8.1?
Sounds like the autoversioning might be broken - I'll take a look. In the meantime, most recent master should address this issue though.
Autoversioning relies on setup.py, so version will only be correct when installed via setup (or pip when we start publishing to PyPI). See #395
After pulling the most recent master, I've tried running it with an image built from the Dockerfile that's in the repo root (which seems to be using the local pull and run the pip install
on that) and with the built images from the dockerhub with the latest
tag, just as specified in the docker-compose.yaml
file.
In both cases, the rasterio
seems to be looking for credentials even with the AWS_NO_SIGN_REQUEST
env variable set.
In my case, I have attached a role which can read from S3, so it'll use credentials from there. If I detach the role, it'll just fail to retrieve the credentials (still trying) and read the image anyways, since it's public.
Here's what's happening for every request with AWS_NO_SIGN_REQUEST=TRUE
:
ows | [2020-08-27 09:12:38,325] botocore.session [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Setting config variable for region to 'eu-central-1'
ows | [2020-08-27 09:12:38,333] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: env
ows | [2020-08-27 09:12:38,334] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: assume-role
ows | [2020-08-27 09:12:38,334] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: assume-role-with-web-identity
ows | [2020-08-27 09:12:38,334] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: sso
ows | [2020-08-27 09:12:38,334] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: shared-credentials-file
ows | [2020-08-27 09:12:38,342] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: custom-process
ows | [2020-08-27 09:12:38,347] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: config-file
ows | [2020-08-27 09:12:38,360] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: ec2-credentials-file
ows | [2020-08-27 09:12:38,361] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: boto-config
ows | [2020-08-27 09:12:38,362] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: container-role
ows | [2020-08-27 09:12:38,362] botocore.credentials [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Looking for credentials via: iam-role
ows | [2020-08-27 09:12:38,364] urllib3.connectionpool [ce7e5098-f929-4d5b-893f-b3ffcb2e22ec] [DEBUG] Starting new HTTP connection (1): 169.254.169.254:80
I can't think of any thing that I can still change to make that work :thinking:
Do you have AWS_DEFAULT_REGION
set?
Nevermind, you said:
AWS_DEFAULT_REGION=eu-central-1
AWS_NO_SIGN_REQUEST=TRUE
On the one hand, things are behaving as expected - unsigned requests is the supposed to be the default, so setting AWS_NO_SIGN_REQUEST to any non-blank value should be the same as not having it set.
I am using a deprecated core API to configure S3 access though. I'll cutover to the new one and see if that helps.
Looks like this issue got "unfixed" https://github.com/mapbox/rasterio/issues/1637
As @Kirill888 notes, this appears to be a rasterio bug.
@valpesendorfer work around is to setup fake credentials, rasterio
won't pass them on to GDAL since unsigned is used, but it will stop it from attempting to get STS from IAM.
I'm going to try writing that workaround into OWS.
@valpesendorfer Can you please try with the unsigned_s3_access
branch? It sets fake credentials if AWS_NO_SIGN_REQUEST is set.
@SpacemanPaul unsigned_s3_access
works!
Log is now:
ows | [2020-08-28 07:02:30,988] botocore.session [c7ae1a41-8e37-4448-84f0-f2cb1e9c17f3] [DEBUG] Setting config variable for region to 'eu-central-1'
ows | [2020-08-28 07:02:30,996] botocore.credentials [c7ae1a41-8e37-4448-84f0-f2cb1e9c17f3] [DEBUG] Looking for credentials via: env
ows | [2020-08-28 07:02:30,997] botocore.credentials [c7ae1a41-8e37-4448-84f0-f2cb1e9c17f3] [INFO] Found credentials in environment variables.
Cool - thanks for verifying. Like I said, we don't have a lot of users using private buckets, but we try to support them.
I understand, but don't you need signed requests to get requester pays working?
Using your branch unsigned_s3_access
fixes the issue with requesting credentials even when the requests are supposed to be unsigned. But it also breaks the option of sending signed requests, by hard coding unsigned=True
here in line 77:
credentials = configure_s3_access(aws_unsigned=True)
Is there any specific reason you switched from set_default_rio_config
to configure_s3_access
?
I've had some issues with using signed requests as a result of how the flag is handled by ogc.py
(where it's the default) and how the actual env variable is handled by gdal
. When you set the variable to any non empty string, it'll set unsigned to True
, at least using the current master. But that does not necessarily reflect the same effect for gdal
where an empty string is the same as setting it to True
. To actually send a signed request, the env variable needs to be missing or set to a False
value. It can't be missing or set to FALSE
/NO
, because that would set it to True
in ogc.py
(at least in the current master). I hope you see the vicious circle developing which I inevitably ended up being trapped in.
To fix this, I modified your code in unsigned_s3_access
a bit, determining if the user wants a signed / unsigned request (still keeping the unsigned as the de-facto default), if it's signed force setting the env variable to NO
to comply with requirements for gdal
, and if unsigned using the fake credentials to prevent the attempt to retrieve credentials from IAM. The setup is then again passed to set_default_rio_config
which works fine for me. This way both signed and unsigned requests can be made successfully and efficiently. I'll share a PR if you want to have a look at it.
Thanks Val. I will aim to get this issue resolved this week, but I have a couple of higher priorities to knock over first.
Fixed in #404
datacube-ows
version:datacube-ows-update --version
Description
I'm running a test deployment on OWS inside a docker container on a single EC2 instance. The data is stored as COGs on a co-located S3 bucket, which is open to the public for read.
Since I found out that using the IAM role as authentication for reading from S3 is a major slowdown (new token for each request), I wanted to see what the difference is compared to the unsigned AWS access on a public dateset.
I've set the following env variables:
and confirmed that the variables are set in the running container.
I've started the docker container using the production settings (so with
gunicorn
) and behind annginx
container. Then using postman & locust I've set agetMap
request for a tile of thedataset
.Still, it looks like for each request,
botocore.credentials
is roaming around for credentials, either using again the attached IAM role, or if detached, just returning a 404 error - in both cases an API call for a token is performed. The request keeps on going and eventually returning the data requested just fine. I'll attach a log below this message.After playing a bit around with
rasterio
and thedatacube.utils.rio
functions, I think this happens each timeactivate_from_config
is called, which seems to be in this case for each request. I've set up a test script, calling the config once, and the reading a raster from S3 multiple times. The search for credentials only happens afteractivate_from_config
is called, for the successive reads, everything works as expected.Not sure if I missed something with the setup, or if there's maybe a more persistent way to set up the rio env for the workers / threads that perform the read.
Thanks!
Here's the log for a request: