snowplow / emr-etl-runner

Run Snowplow's enrichments on Amazon Elastic MapReduce with minimum fuss
0 stars 5 forks source link

Support IAM roles #58

Open smugryan opened 9 years ago

smugryan commented 9 years ago

Would be great to not have to put IAM user key/secrets into configuration files and instead have the EMR/ETL and storage loader tools pull that info from the EC2 role info from the instance it's living on.

This way we don't have to create a new user under our AWS account and instead assign the role to the machine that is running the EMR/ETL/storage loader jobs. Better for security and ease of setup.

alexanderdean commented 9 years ago

This is a cool idea - pull request most welcome!

smugryan commented 9 years ago

:+1:

alexanderdean commented 9 years ago

Sister ticket: https://github.com/snowplow/sluice/issues/31

smugryan commented 9 years ago

Why a separate and blank ticket? (Just curious if I could be filing tickets in a better manner)

alexanderdean commented 9 years ago

I'm pretty sure we will need to update Sluice to support IAM roles too...

smugryan commented 9 years ago

:+1:

ludwigm commented 6 years ago

This ticket is quite old and security best practices for AWS state to not hard-code those secret keys but to use IAM roles with instance profiles or other kind of temporary tokens. I tried to track this problem down and one thing I found is the lacking support for that in Sluice as already mentioned. Any updated on this? Is it on some roadmap? Quite an important feature for us to make the security right. Together with the documentation of the IAM permissions needed which do not confirm with the principle of least privilege it is even more of a problem: https://discourse.snowplowanalytics.com/t/what-is-the-minimum-viable-iam-policy-for-snowplow-operation/192

BenFradet commented 6 years ago

We're planning on removing the sluice dependency and using fog-aws directly which supports iam, I think there was a ticket dedicated to the move but I can't seem to find it.

However, AFAIK elasticity, the ruby wrapper around the emr api we're using inside emr etl runner doesn't provide a way to use iam yet. So there might still be a bit of work there in the medium term.

However, in the longer term, we're planning on moving to dataflow-runner which supports IAM directly.

This is for the batch pipeline, the real-time pipeline already supports iam roles.

ludwigm commented 6 years ago

Awesome. Many thanks for the fast reply. That already helps us to plan ahead :)

alexanderdean commented 6 years ago

At this point, any dev cycles we could put into improving EmrEtlRunner, we could instead put into the migration to Dataflow Runner, so most likely this won't happen. But added: https://github.com/snowplow/dataflow-runner/issues/34