stanford-mast / pocket

Elastic ephemeral storage
118 stars 28 forks source link

Configuration Issues #1

Closed anuragkh closed 5 years ago

anuragkh commented 5 years ago

Hi! I'm trying to deploy pocket following the instructions outlined here, but I'm running into a couple of issues:

unexpected error during validation: error listing nodes: Get https://internal-api-pocketcluster-k8s-loc-404v42-1700611634.us-east-1.elb.amazonaws.com/api/v1/nodes: EOF

I've tried waiting a long time (1hr), and the output remains the same. I've used the default settings from pocketcluster.template.yaml, except I use the same was subnet for both public and private subnets, and remove the NAT configuration from the egress of the private subnet (since I don't setup the NAT). Is this a possible reason for the issue? I'm not sure the deploy readme provides enough detail about how the subnets and NAT need to be configured on AWS.

anakli commented 5 years ago

Regarding the AMI, can you please check if you are in the US-West (Oregon) region? The pocket AMI should be available in that region.

The reason for using a NAT in the setup is to enable lambdas to talk to both Pocket (which is running in a VPC) and public internet services (such as S3). For example, lambdas in a job may need to fetch original input data from S3 and then use Pocket for intermediate/ephemeral data. According to the Amazon documentation, enabling lambdas to access both VPC and public internet services requires using a NAT. So we basically create a VPC setup for Pocket like in this example scenario described in the Amazon docs. This requires you to have a public and private subnet, a NAT, and an Internet Gateway. I can update the Pocket README to provide more detailed instructions for this setup.

If you are planning to use Pocket for jobs that run on EC2 machines or for lambdas that only need to access the Pocket VPC and no public internet services, it should be fine to use a simpler setup with just a public subnet and no NAT. However, we have not tested this. Here is a cluster config YAML file example from the kops docs for a cluster that does not have a NAT.

anuragkh commented 5 years ago

Thanks for the quick response! I was in US-East (N. Virginia), the AMI does exist in Oregon.

I'll try setting up the subnets/NAT based on the Amazon example and close the issue if that solves it.

anuragkh commented 5 years ago

Works with the new configuration, thanks!