widdix / mastodon-on-aws

Host your own Mastodon instance on AWS
https://cloudonaut.io/mastodon-on-aws/
136 stars 27 forks source link

Deployment failure in us-east-1: You can't create a db.t4g.micro Multi-AZ instance #3

Closed scrappydog closed 1 year ago

scrappydog commented 1 year ago

This project seems super cool, and I'm excited about it... unfortunately your CloudFormation stack is failing for me (running under a AWS root account).

Here is the failure event:

Database | CREATE_FAILED | Embedded stack arn:aws:cloudformation:us-east-1:264727885608:stack/mastodon-on-aws-Database-1C0KPH3GB9VPK/5adefb80-6690-11ed-9ebb-0e9eac0d0f09 was not successfully created: The following resource(s) failed to create: [Instance].

mastodon-on-aws | ROLLBACK_IN_PROGRESS | The following resource(s) failed to create: [Database, Alb, Certificate, Cache]. Rollback requested by user.

Good news: It looks like the rollback process mostly works! :-)

One more error:

HostedZone | DELETE_FAILED | Embedded stack arn:aws:cloudformation:us-east-1:264727885608:stack/mastodon-on-aws-HostedZone-16CBXD4EV74B0/c32742c0-668f-11ed-a315-1260d77cfdf9 was not successfully deleted: The following resource(s) failed to delete: [HostedZone].

Thanks!

scrappydog commented 1 year ago

Just realized this are 11 nested stacks... 5 have errors... let me know how to get you logs or other troubleshooting info?

jbold commented 1 year ago

@scrappydog I found these docs for troubleshooting stack failures: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/troubleshooting.html

michaelwittig commented 1 year ago

@scrappydog can you send us the events that have a red status of the CloudFormation stack mastodon-on-aws-Database-1C0KPH3GB9VPK (by default, CloudFormation shows Active stacks only. Switch from Active to Deleted to see the stack). For some reasons, your database instance failed.

scrappydog commented 1 year ago

Doesn't look very helpful... just says it failed...

2022-11-17 08:56:39 UTC-0700 | Instance | CREATE_FAILED | Embedded stack arn:aws:cloudformation:us-east-1:264727885608:stack/mastodon-on-aws-Database-1C0KPH3GB9VPK-Instance-I1F615UZ9OTH/5e73bb00-6690-11ed-acdf-0e69374a54e3 was not successfully created: The following resource(s) failed to create: [DBInstance].

michaelwittig commented 1 year ago

@scrappydog Ok. one level depper please :) Could you check the same for the stack mastodon-on-aws-Database-1C0KPH3GB9VPK-Instance-I1F615UZ9OTH?

scrappydog commented 1 year ago

This one looks helpful

2022-11-17 08:56:33 UTC-0700 | DBInstance | CREATE_FAILED | Resource handler returned message: "You can't create a db.t4g.micro Multi-AZ instance because at least 2 subnets must exist in availability zones with sufficient capacity for VPC and storage type : gp2 for db.t4g.micro, so 1 more must be created in other availability zones; choose from these availability zones: us-east-1c, us-east-1d, us-east-1e, us-east-1f. (Service: Rds, Status Code: 400, Request ID: 9f267ca1-ea32-4c95-b70a-f18c5caf9b75)" (RequestToken: 1db9185b-eb3a-6d09-ec70-f0fd1b99ee66, HandlerErrorCode: InvalidRequest)

michaelwittig commented 1 year ago

Can you set the value for Resources Vpc.Properties.Parameters.NumberOfAvailabilityZones in your template to 4? (see https://github.com/widdix/mastodon-on-aws/blob/main/mastodon.yaml#L73)

It looks like the us-east-1 region comes with availability zones that do not support t4 instance types.

pegli commented 1 year ago

When I set the number of AZs to 4 using CloudFormation Designer, one of the VPC templates failed. I changed it to 3 and redeployed, and that seemed to do the trick.

Edit: this is in us-west-1, btw.

scrappydog commented 1 year ago

I tried running with AZ's set to 3 and got farther no errors!

But the Certificate create has been hanging for 1.5 hours...

2022-11-22 07:23:10 UTC-0700 | Certificate | CREATE_IN_PROGRESS | Content of DNS Record is: {Name: _de5b080381a3c15b93c1e0e49feef253.mastodon.greyduck.social.,Type: CNAME,Value: _b84c38ff80c44892a524226a1e37cca2.zrvsvrxrgs.acm-validations.aws.}

One other change: I renamed my edited the stack from 'mastodon-on-aws' to my own custom name to clearly differentiate... I wonder if that broke something?

Also wondering if maybe I'm hitting conflicts with something left behind by my previous failed roll-back?

pegli commented 1 year ago

I've renamed the stack with no problems.

For the certificate, make sure you're updating the DNS entries in your hostname registration as described in the README. I'm embarrassed to say that it took me a couple of tries before I realized that I needed to copy the DNS server names FROM the HostedZone TO the domain registration. Also, in my experience, certificate validation can take 30+ minutes.

Also, check to make sure you don't have duplicate HostedZones for your hostname. HostedZones won't be deleted if there are any records other than A and SOA, and you may have a stray CNAME record in there from the certificate validation step in a previous run. You can delete that CNAME record by hand, then delete the whole zone by hand.

scrappydog commented 1 year ago

Looks like I have a duplicate zone in route 53 left over from the previous failed install. I think I'll rollback and do some manual cleanup and retry and report back... Thanks Paul!

andreaswittig commented 1 year ago

@scrappydog You mentioned the following when creating the issue:

HostedZone | DELETE_FAILED | Embedded stack arn:aws:cloudformation:us-east-1:264727885608:stack/mastodon-on-aws-HostedZone-16CBXD4EV74B0/c32742c0-668f-11ed-a315-1260d77cfdf9 was not successfully deleted: The following resource(s) failed to delete: [HostedZone].

So when CloudFormation rolled back and tried to delete the stack it could not delete the hosted zone because the hosted zone contained a record. Therefore, CloudFormation kept the hosted zone.

Let me check if there is an issue with the dependencies between the CloudFormation resources.

scrappydog commented 1 year ago

I now have a happy functioning Mastodon server after a lot of experimentation and these learnings:

  1. Bumping the NumberOfAvailabilityZones value to 3 solved my initial problem.
  2. The fact that you need to edit DNS config AFTER the zone is created inorder for certificate creation to complete was non-obvious.
  3. Major differences in what you need to do in DNS depending on whether you are installing to the root of the domain or to a subdomain.
  4. The fact that the install always creates a new DNS zone instead of merging into an existing zone (if one exists) is painful.
  5. If things aren't working... try starting over in a different AWS zone.

Thanks for the cool tool and all the support!