Closed nicolaka closed 8 years ago
Thoughts:
It might be better to do this with Route53 using a TXT record. When instances in the cluster come up, they can query the internal hosted zone we create for the cluster for the TXT record. The TXT record will contain the SHA hash. This should be cheaper than S3 (even though both are nearly free in this case) and is more easily interacted with by members of the cluster.
Thinking about the workflow.
In CFN, a user-inputted parameter creates the R53 internal hosted zone for the cluster.
For the master instance userdata script, it checks to see if user-created.r53.zone TXT (the SHA value) exists. If it exists, it takes the value of the SHA to identify a cluster already exists.
If it doesn't exist, generate a SHA key and write it to a tmp file locally. Create a new R53 resource record of type TXT with the value of the SHA key.
For the cluster member ASG launch configuration user-data script, instances upon boot will check for user-created.r53.zone TXT for the SHA key. If it does not exist, die. If it exists, grab the value for the SHA key configuration.
Reasons this may not work:
1) The SHA key is a secret and we do not want others in the VPC to be able to query this value. However, using a private hosted zone, the record will only be resolvable from inside the VPC.
2)The SHA key length is greater than 255 characters.
@nicolaka if you can verify some of this thinking, that would be great. Should be relatively straightforward to implement this approach.
So, it turns out that Route53 isn't as consistent as we need it to be for this type of operation. There can be situations where you create the TXT record in your hosted zone and it's not available for a few minutes.
This can lead to race conditions where the master thinks it hasn't created the resource record, but it actually has and the record just isn't available yet. This might be an edge case but we need something to provide consistency.
I think adding Dynamo, at the risk of over-engineering this, using conditional writes i.e. "If this value doesn't exist, write it" for the SHA is the way to go.
This will become a two part operation that is guaranteed to be consistent.
The master will generate the SHA and if a record does not exist in our Dynamo table, write the SHA. Then create the TXT record in the Hosted Zone for simple consumption (using dig) by other members of the cluster.
This can be pretty cheap because we only to provision a single read and write capacity unit and to store a single value.
@bchav i'm thinking this can be done in an easier fashion. We can curl and get the pem cert and extract the fingerprint using an openssl
command. I'll try to test that and see and let you know. It's much more simpler that way.
I agree it's simpler, but without something providing consistency, do we have the ability to let the master survive a reboot/autoscaling event?
reboot it should survive, not sure about controller autoscaling.
Can you check out the changes i made here ? https://github.com/nicolaka/ddc-aws/blob/master/ddc_on_aws.json#L424
It should work right? passing the controller private address and creating an env var called UCP_URL from it?
@bchav This is when we need to capture the SHA and load it to S3 bucket. Assume I can get the parameter please add the S3 logic here https://github.com/nicolaka/ddc-aws/blob/master/ddc_on_aws.json#L369