Closed pomegranited closed 8 years ago
Thanks for the pull request, @pomegranited! It looks like you're a member of a company that does contract work for edX. If you're doing this work as part of a paid contract with edX, you should talk to edX about who will review this pull request. If this work is not part of a paid contract with edX, then you should ensure that there is an OSPR issue to track this work in JIRA, so that we don't lose track of your pull request.
To automatically create an OSPR issue for this pull request, just visit this link: https://openedx-webhooks.herokuapp.com/github/process_pr?repo=edx%2Fedx-analytics-configuration&number=31
@pomegranited I'm actively working on deploying to private subnets. This change will not work in that configuration.
In generaly we try to limit access to the EMR clusters as much as possible given their broad access. Could you deploy your jenkins instance into the same VPC as your clusters so that they can address one another?
@mulby It's ok if this change (and edx-analytics-pipeline PR 229) can't be merged, but I'd like to keep the branches around for a little while to facilitate testing.
I needed a lower-cost way to do my dev work, so thought the cheapest option would be to run on my local VM, instead of on AWS. Also, we're looking at using OpenStack more, so I thought this would help.
Is there a way to add local vagrant instances and/or servers outside AWS to the VPC?
@pomegranited There seems to be some logic in Ansible that is able to detect whether w are currently running in AWS, and if we are in AWS whether we are in VPC. This would require more research, and I don't know if it's worth it, but possibly we cloud make this behavior varied depending whether we are on VPC or not.
Hint about existence of this logic is here --- here depending on whether we are in VPC ansible uses different variable to obtain routable server address. I'd guess that you'd have just to query ansible facts for something related to VPC.
Also note that Gabe's remark possibly also applies to https://github.com/edx/edx-analytics-pipeline/pull/229 .
@pomegranited you can use a jump box that has a public IP in your VPC to proxy connections into the private address space.
The basic idea is that you deploy an ec2 instance into a public subnet of the VPC with a public IP address. We call this type of thing a "bastion". Put your public key on that bastion machine so that you can SSH into it.
Modify your .ssh/config to include something like this:
Host 10.10.*
ProxyCommand ssh <public IP of the bastion> -W %h:%p
The above assumes that all of your EMR clusters will be deployed into the CIDR block 10.10.0.0/16.
It also assumes that your security groups, network ACLs etc are all configured to allow SSH connections from the bastion into the EMR master nodes. I expect @omarkhan or @mtyaka could probably help out with this if you wanted.
Another option is to use the AWS VPN gateway, but I don't know too much about it.
@mulby and @jbzdak Thank you for the information, I'll try the bastion approach next time.
Allows access to the EMR master node by servers (e.g. devstacks) that sit outside the AWS VPC.
Without this change, EMR services provisioned by provision.yml are referenced by their AWS internal IP addresses, which are not accessible outside of the VPC.
JIRA tickets: Cleanup for OLIVE-22, OC-1512
Dependencies: None
Sandbox URL:
http://52.20.136.133:8080/ - ping me for login credentials.
Testing instructions:
See edx/configuration PR #2939 for full test instructions.
Expected result is the EMR instance will provision and terminate successfully (if your configuration is correct).
Without this page, the EMR provisioning fails with ssh errors.
Reviewers