Closed stenmigerode closed 1 year ago
Currently the R package to connect to and use AWS S3 is using Environment Variables in which we'd have to configure the AWS Key, Secret and Region, if you are not specifying these settings.
We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.
Do you need to read/write in the bucket throughout the lifetime of the Docker container? Or is the data read once at startup?
Exactly, we load the data once before the app starts.
We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.
But how do you get the production credentials? Is the aws.s3 package doing the IMDSv2 calls if no environment variables are found?
We only use these environment variables for the default approach. If needed, we can set up an UAT profile, which doesn't use the environment variables. As we currently do for the production approach.
But how do you get the production credentials? Is the aws.s3 package doing the IMDSv2 calls if no environment variables are found?
See https://github.com/inbo/reporting-rshiny-grofwildjacht/issues/326#issuecomment-1315483084 This is the approach we follow:
3. If R is running on an EC2 instance, the role profile credentials provided by aws.ec2metadata, if the aws.ec2metadata package is installed.
I have installed the necessary packages for this here
So if I understand correctly, by setting "R_CONFIG_ACTIVE: production" in UAT as well, it should stop failing on the missing environment variables for key, secret and region and retrieve them from the metadata automatically ?
I will try this in UAT.
So if I understand correctly, by setting "R_CONFIG_ACTIVE: production" in UAT as well, it should stop failing on the missing environment variables for key, secret and region and retrieve them from the metadata automatically ?
I will try this in UAT.
I didn't know it was failing on UAT.
Given the approach described here I would expect if the environment variables are not available, it would switch to approach 3.
So, if it currently fails on UAT, I don't expect R_CONFIG_ACTIVE: production
to work, but it's worth a try.
It is not working in UAT, that was the reason I am opening this ticket to ask to not use ENV variables.
This seems to be the piece of code causing this error:
@stenmigerode Is there any way we can check if approach 3 works? Did you try R_CONFIG_ACTIVE: production
?
I'm available for a quick call if needed.
I have not tested that, because I checked your config.yml file and setting that parameter to 'production' will also try to use the PRD bucket instead of the UAT one. That will again fail, because the UAT server does not have access to the PRD bucket.
I think you will need to adapt the code/config to remove the datacheck and so it stops trying to get credentials via the environment variables.
@stenmigerode Thanks for clarifying, I assumed the ENV variables would be available on the EC2. I've updated the check to act differently on an EC2 instance: just trying to read the instance ID before checking connection with all data.
I've also added a specific configuration for the UAT in yaml, which no longer defines the ENV variables, but still performs the required data check.
So, can you try R_CONFIG_ACTIVE: uat
?
I will do this now
Hi @mvarewyck,
It seems some packages are still missing for this to work:
Can you please build the docker image again? This should be fixed with commit a9c3597
@mvarewyck The container is not responding in time now.
Via the docker logs I can see the S3 connection seems to be working now. I don't see any other errors:
R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Checking rgeos availability: TRUE
During startup - Warning messages:
1: replacing previous import ‘shiny::dataTableOutput’ by ‘DT::dataTableOutput’ when loading ‘reportingGrofwild’
2: replacing previous import ‘shiny::renderDataTable’ by ‘DT::renderDataTable’ when loading ‘reportingGrofwild’
> reportingGrofwild::runWildApp(public = FALSE)
Test Data in S3 bucket
..SS.................S
Finished successfully
Loading required package: shiny
Loading required package: ggplot2
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
Attaching package: ‘shinyjs’
The following object is masked from ‘package:shiny’:
runExample
The following object is masked from ‘package:sp’:
show
The following objects are masked from ‘package:methods’:
removeClass, show
@mvarewyck The container is not responding in time now.
This could indeed be the case. We are running multiple data checks before starting the application on UAT. So, only on UAT we will need a larger waiting time. If you want to check the app, we can set (temporarily) datacheck: false
here
Which setting do I need to change to increase the waiting time?
I think it should work by increasing container-wait-time: 180000
, see https://www.shinyproxy.io/documentation/configuration/
Locally it takes 2.23555 mins before the app starts.
I had found the configuration parameter. I tried 50 seconds, but that is not sufficient either.
I will update it to 180 seconds so we can test, but I cannot imagine that anyone will ever wait for 3 minutes before they can see the application.
I will update it to 180 seconds so we can test, but I cannot imagine that anyone will ever wait for 3 minutes before they can see the application.
This is only for development, we want to check all data that is used in the application. For production we don't run these tests.
Like @mvarewyck says its only for development and it was implemented on my request.
@mvarewyck @SanderDevisscher i have deployed the public and private app and they seem to work now.
Superbe w'll start testing asap
I consider this issue fixed by #352
Possible use of IMDSv2 to retrieve AWS Credentials
Currently the R package to connect to and use AWS S3 is using Environment Variables in which we'd have to configure the AWS Key, Secret and Region, if you are not specifying these settings.
The AWS server that is used to run ShinyProxy and Docker is configured to automatically get access to the services that we have configured in it's IAM Role. In this case we have enabled access from the server to the S3 bucket. To do this, the AWS CLI is automatically configured on this server. For third party applications and tools it is best to use IMDSv2.
Do you need to read/write in the bucket throughout the lifetime of the Docker container? Or is the data read once at startup? The difficulty of IMDSv2 access keys is that they are temporary (valid for 6 hours). If this would not be possible, then we'd have to look for secure way to store permanent acces keys and secrets on the EC2 host and pass them via the application.yml to the docker containers.
Below you can find an example of the curl calls that you could use to get those credentials.
Get a token to do the request for credentials:
Use this token in the next curl request to retrieve the security credentials:
AccessKeyId en SecretAccessKey can be used to connect to the s3 bucket.
To retrieve the current region:
Important
Do not request these credentials everytime. There's a limit in the amount of requests in IMDSv2. Ideally this call is done only when the old access is about to expire.
More info
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html