inbo / reporting-rshiny-grofwildjacht

Rshiny app for grofwildjacht
https://grofwildjacht.inbo.be/
MIT License
1 stars 1 forks source link

Possibility to use IMDSv2 to retrieve the AWS Credentials and Region instead of relying on Environment Variables? #379

Closed stenmigerode closed 1 year ago

stenmigerode commented 1 year ago

Possible use of IMDSv2 to retrieve AWS Credentials

Currently the R package to connect to and use AWS S3 is using Environment Variables in which we'd have to configure the AWS Key, Secret and Region, if you are not specifying these settings.

The AWS server that is used to run ShinyProxy and Docker is configured to automatically get access to the services that we have configured in it's IAM Role. In this case we have enabled access from the server to the S3 bucket. To do this, the AWS CLI is automatically configured on this server. For third party applications and tools it is best to use IMDSv2.

Do you need to read/write in the bucket throughout the lifetime of the Docker container? Or is the data read once at startup? The difficulty of IMDSv2 access keys is that they are temporary (valid for 6 hours). If this would not be possible, then we'd have to look for secure way to store permanent acces keys and secrets on the EC2 host and pass them via the application.yml to the docker containers.

Below you can find an example of the curl calls that you could use to get those credentials.

Get a token to do the request for credentials:

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 3600"`

Use this token in the next curl request to retrieve the security credentials:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/identity-credentials/ec2/security-credentials/ec2-instance
{
  "Code" : "Success",
  "LastUpdated" : "2023-02-08T16:32:16Z",
  "Type" : "AWS-HMAC",
  "AccessKeyId" : "xxxx",
  "SecretAccessKey" : "xxxx",
  "Token" : "xxxx",
  "Expiration" : "2023-02-08T22:40:50Z"
}

AccessKeyId en SecretAccessKey can be used to connect to the s3 bucket.

To retrieve the current region:

curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "accountId" : "xxxx",
  "architecture" : "x86_64",
  "availabilityZone" : "eu-west-1c",
  "billingProducts" : null,
  "devpayProductCodes" : null,
  "marketplaceProductCodes" : null,
  "imageId" : "xxxx",
  "instanceId" : "xxxx",
  "instanceType" : "xxxx",
  "kernelId" : null,
  "pendingTime" : "2023-02-08T05:06:05Z",
  "privateIp" : "xxxx",
  "ramdiskId" : null,
  "region" : "eu-west-1",
  "version" : "2017-09-30"
}

Important

Do not request these credentials everytime. There's a limit in the amount of requests in IMDSv2. Ideally this call is done only when the old access is about to expire.

More info

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

mvarewyck commented 1 year ago

Currently the R package to connect to and use AWS S3 is using Environment Variables in which we'd have to configure the AWS Key, Secret and Region, if you are not specifying these settings.

We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.

Do you need to read/write in the bucket throughout the lifetime of the Docker container? Or is the data read once at startup?

Exactly, we load the data once before the app starts.

stenmigerode commented 1 year ago

We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.

But how do you get the production credentials? Is the aws.s3 package doing the IMDSv2 calls if no environment variables are found?

mvarewyck commented 1 year ago

We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.

But how do you get the production credentials? Is the aws.s3 package doing the IMDSv2 calls if no environment variables are found?

See https://github.com/inbo/reporting-rshiny-grofwildjacht/issues/326#issuecomment-1315483084 This is the approach we follow:

3. If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.

I have installed the necessary packages for this here

stenmigerode commented 1 year ago

So if I understand correctly, by setting "R_CONFIG_ACTIVE: production" in UAT as well, it should stop failing on the missing environment variables for key, secret and region and retrieve them from the metadata automatically ?

I will try this in UAT.

mvarewyck commented 1 year ago

So if I understand correctly, by setting "R_CONFIG_ACTIVE: production" in UAT as well, it should stop failing on the missing environment variables for key, secret and region and retrieve them from the metadata automatically ?

I will try this in UAT.

I didn't know it was failing on UAT. Given the approach described here I would expect if the environment variables are not available, it would switch to approach 3. So, if it currently fails on UAT, I don't expect R_CONFIG_ACTIVE: production to work, but it's worth a try.

stenmigerode commented 1 year ago

It is not working in UAT, that was the reason I am opening this ticket to ask to not use ENV variables.

image

This seems to be the piece of code causing this error:

image
mvarewyck commented 1 year ago

@stenmigerode Is there any way we can check if approach 3 works? Did you try R_CONFIG_ACTIVE: production? I'm available for a quick call if needed.

stenmigerode commented 1 year ago

I have not tested that, because I checked your config.yml file and setting that parameter to 'production' will also try to use the PRD bucket instead of the UAT one. That will again fail, because the UAT server does not have access to the PRD bucket.

I think you will need to adapt the code/config to remove the datacheck and so it stops trying to get credentials via the environment variables.

mvarewyck commented 1 year ago

@stenmigerode Thanks for clarifying, I assumed the ENV variables would be available on the EC2. I've updated the check to act differently on an EC2 instance: just trying to read the instance ID before checking connection with all data. I've also added a specific configuration for the UAT in yaml, which no longer defines the ENV variables, but still performs the required data check. So, can you try R_CONFIG_ACTIVE: uat?

stenmigerode commented 1 year ago

I will do this now

stenmigerode commented 1 year ago

Hi @mvarewyck,

It seems some packages are still missing for this to work:

image
mvarewyck commented 1 year ago

Can you please build the docker image again? This should be fixed with commit a9c3597

stenmigerode commented 1 year ago

@mvarewyck The container is not responding in time now.

Via the docker logs I can see the S3 connection seems to be working now. I don't see any other errors:

R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Checking rgeos availability: TRUE
During startup - Warning messages:
1: replacing previous import ‘shiny::dataTableOutput’ by ‘DT::dataTableOutput’ when loading ‘reportingGrofwild’
2: replacing previous import ‘shiny::renderDataTable’ by ‘DT::renderDataTable’ when loading ‘reportingGrofwild’
> reportingGrofwild::runWildApp(public = FALSE)
Test Data in S3 bucket
..SS.................S
Finished successfully
Loading required package: shiny
Loading required package: ggplot2

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout

Attaching package: ‘shinyjs’

The following object is masked from ‘package:shiny’:

    runExample

The following object is masked from ‘package:sp’:

    show

The following objects are masked from ‘package:methods’:

    removeClass, show
mvarewyck commented 1 year ago

@mvarewyck The container is not responding in time now.

This could indeed be the case. We are running multiple data checks before starting the application on UAT. So, only on UAT we will need a larger waiting time. If you want to check the app, we can set (temporarily) datacheck: false here

stenmigerode commented 1 year ago

Which setting do I need to change to increase the waiting time?

mvarewyck commented 1 year ago

I think it should work by increasing container-wait-time: 180000, see https://www.shinyproxy.io/documentation/configuration/ Locally it takes 2.23555 mins before the app starts.

stenmigerode commented 1 year ago

I had found the configuration parameter. I tried 50 seconds, but that is not sufficient either.

I will update it to 180 seconds so we can test, but I cannot imagine that anyone will ever wait for 3 minutes before they can see the application.

mvarewyck commented 1 year ago

I will update it to 180 seconds so we can test, but I cannot imagine that anyone will ever wait for 3 minutes before they can see the application.

This is only for development, we want to check all data that is used in the application. For production we don't run these tests.

SanderDevisscher commented 1 year ago

Like @mvarewyck says its only for development and it was implemented on my request.

stenmigerode commented 1 year ago

@mvarewyck @SanderDevisscher i have deployed the public and private app and they seem to work now.

SanderDevisscher commented 1 year ago

Superbe w'll start testing asap

SanderDevisscher commented 1 year ago

I consider this issue fixed by #352