Closed clayt0nk closed 5 years ago
Try to delete the stack and start from scratch. But probably you need to find out why the stack got to ROLLBACK_COMPLETE status, something might be related to this specific region
I have never actually seen the algo stack show up in the AWS console, something is making sure it gets deleted automatically and very quickly. (And unfortunately I do not have AWS support in this account.) This is the ap-northeast-1 (Japan) region.
Today's bit of testing:
Could I be the only one left who has not yet consumed his AWS' free-server-for-a-year and thus cares about AWS?
@clayt0nk The stack can't just disappear form the console. If it's deleted, you'll see on the deleted page.
Moreover, I have just successfully deployed Algo to the ap-northeast-1 region
@clayt0nk The stack can't just disappear form the console. If it's deleted, you'll see on the deleted page.
Then I suspect the stack never got to the point of being created in the first place, which is not hard to believe since "ValidationError" suggests something akin to a syntax error. After deploying successfully now in us-east there is now an "algo" stack in my CF console, and there is absolutely nothing in the "deleted" list after multiple deployment failures in Asian regions.
Did you modify any config options? Could you put the full log here and the options chosen?
The only config modification was to add a list of users to config.cfg. At this stage I accept defaults everywhere, except this one (since I have Linux clients):
"Do you want the VPN to support Windows 10 or Linux Desktop clients? (enables compatible ciphers and key exchange, less secure)"
Is what I appended to the original bug report not the log you are looking for? I am not seeing any logs in the script directory.
I have just found:
If a stack fails to create by accident or user interaction, cloudformation executes a rollback meaning to delete all previously created resources. The stack itself stays in place in a rollback_completed state to enable users to inspect and debug the problems. It is not possible to create this stack again.
So the stack should be there, you need to find out what was the probelm, because I can't reproduce the same error on my end.
@clayt0nk any updates on this?
I plan on trying to reproduce it again some time soon, just not today. The network is very bad here right now, and no one else seems to be complaining so I am not making it a high priority for the moment.
One bit of good news that somewhat mitigates the bigger chunk of bad news that follows: I just successfully deployed to Seoul. So at least, not all of Asia is broken.
However, today's attempt to deploy to both Tokyo and Singapore both failed again with the same CF ValidationError exactly like before, despite the fact that the Algo CF stack now already exists thanks to the previous successful us-east deployment. Again, AWS console is showing no errors. The last event in the Events tab of the Algo stack dates from 14 July, when I deployed to us-east. Not even today's successful deploy to Seoul changed that, so it reflects neither the successful nor the failed deploys from today.
I have worked with AWS CF a fair bit in the past on behalf of a client. This would not be the first time I had to resort to AWS tech support to get to the bottom of an obscure CF error. However, this is my personal AWS account and I do not have support, so I am not sure if there is much else I can do here.
I don't have any troubleshooting tips but I can confirm that Algo used to work fine on AWS EC2 in the Tokyo region (deployed July 2018, used it that way for a year).
Ok, I believe I know what is going on here, and it looks like user error that should be avoidable with a bit of software help.
1st problem: It is a long time since I have used CF, and somehow I thought that CF was pan-regional and not region specific. With that mis-perception corrected, I was able to find the failed Tokyo stack.
2nd problem: I don't know if this is me-specific, but my first Tokyo deploy failed because there is a "verification" step of some kind from the AWS end, before my use of the Tokyo region was enabled. This left behind a failed algo stack from my first Tokyo attempt while AWS was getting their act together.
3rd problem: this being my first re-deploy to the same region, I did not realize that I needed to manually delete the previous algo stack. Apparently the mere existence of an algo stack causes the next deployment to fail with the rather uninformative CF ValidationError.
With the previous algo stack deleted, I am now running in the Tokyo region.
Given the above, it would be very helpful if the algo script checked for an existing algo CF stack in the region, and if it finds one, bails with instructions to go delete it and try again. Or better yet, offer to delete it on my behalf.
Given the above, it would be very helpful if the algo script checked for an existing algo CF stack in the region, and if it finds one, bails with instructions to go delete it and try again. Or better yet, offer to delete it on my behalf.
We don't really want to add more complexity to the installer. The entire issue seems like happened only because lack of experience with AWS and CF. For cases like this we always suggest users to use DigitalOcean as the simplest cloud solution we support, or AWS Lightsail as it's also simple, and has a free tier plan. I'm closing this issue, because we can't do much here, sorry.
Note that I have this working just fine on DigitalOcean. I also tried this on AWS with my root API keys, just to make sure there are no permission issues. Nonetheless, on AWS, I get this every time:
TASK [cloud-ec2 : Deploy the template] ***** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ClientError: An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to update stack algo: An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. - <class 'botocore.exceptions.ClientError'>"}
Full log
$ ./algo
PLAY [localhost] ***
TASK [Gathering Facts] ***** ok: [localhost]
TASK [Ensure the requirements installed] *** ok: [localhost]
TASK [Verify Ansible meets Algo VPN requirements.] ***** ok: [localhost] => { "changed": false, "msg": "All assertions passed" }
PLAY [Ask user for the input] **
TASK [Gathering Facts] ***** ok: [localhost] [Cloud prompt] What provider would you like to use?
Enter the number of your desired provider :
TASK [Cloud prompt] **** ok: [localhost]
TASK [Set facts based on the input] **** ok: [localhost] [VPN server name prompt] Name the vpn server [algo] :
TASK [VPN server name prompt] ** ok: [localhost] [Cellular On Demand prompt] Do you want macOS/iOS IPsec clients to enable "Connect On Demand" when connected to cellular networks? [y/N] :
TASK [Cellular On Demand prompt] *** ok: [localhost] [Wi-Fi On Demand prompt] Do you want macOS/iOS IPsec clients to enable "Connect On Demand" when connected to Wi-Fi? [y/N] :
TASK [Wi-Fi On Demand prompt] ** ok: [localhost] [Compatible ciphers prompt] Do you want the VPN to support Windows 10 or Linux Desktop clients? (enables compatible ciphers and key exchange, less secure) [y/N] :
TASK [Compatible ciphers prompt] *** ok: [localhost] [Retain the PKI prompt] Do you want to retain the keys (PKI)? (required to add users in the future, but less secure) [y/N] :
TASK [Retain the PKI prompt] *** ok: [localhost] [DNS adblocking prompt] Do you want to enable DNS ad blocking on this VPN server? [y/N] :
TASK [DNS adblocking prompt] *** ok: [localhost] [SSH tunneling prompt] Do you want each user to have their own account for SSH tunneling? [y/N] :
TASK [SSH tunneling prompt] **** ok: [localhost]
TASK [Set facts based on the input] **** ok: [localhost]
PLAY [Provision the server] ****
TASK [Gathering Facts] ***** ok: [localhost]
--> Please include the following block of text when reporting issues:
Algo running on: Debian GNU/Linux 10 (buster) (Virtualized: xen) Created from git fork. Last commit: 090a60d PKI to tmpfs (#1496) Python 2.7.16 Runtime variables: algo_provider "ec2" algo_ondemand_cellular "False" algo_ondemand_wifi "False" algo_ondemand_wifi_exclude "X251bGw=" algo_windows "True" algo_dns_adblocking "False" algo_ssh_tunneling "False" wireguard_enabled "True" dns_encryption "True"
TASK [Display the invocation environment] ** changed: [localhost -> localhost]
TASK [Install the requirements] **** ok: [localhost -> localhost]
TASK [Generate the SSH private key] **** ok: [localhost]
TASK [Generate the SSH public key] ***** ok: [localhost]
TASK [cloud-ec2 : Install requirements] **** ok: [localhost] [cloud-ec2 : pause] Enter your aws_access_key (http://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html) Note: Make sure to use an IAM user with an acceptable policy attached (see https://github.com/trailofbits/algo/blob/master/docs/deploy-from-ansible.md) (output is hidden):
TASK [cloud-ec2 : pause] *** ok: [localhost] [cloud-ec2 : pause] Enter your aws_secret_key (http://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html) (output is hidden):
TASK [cloud-ec2 : pause] *** ok: [localhost]
TASK [cloud-ec2 : set_fact] **** ok: [localhost]
TASK [cloud-ec2 : Get regions] ***** ok: [localhost]
TASK [cloud-ec2 : Set facts about the regions] ***** ok: [localhost]
TASK [cloud-ec2 : Set the default region] ** ok: [localhost] [cloud-ec2 : pause] What region should the server be located in? (https://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region)
Enter the number of your desired region [13] :
TASK [cloud-ec2 : pause] *** ok: [localhost]
TASK [cloud-ec2 : Set algo_region and stack_name facts] **** ok: [localhost]
TASK [cloud-ec2 : Locate official AMI for region] ** ok: [localhost]
TASK [cloud-ec2 : Set the ami id as a fact] **** ok: [localhost]
TASK [cloud-ec2 : Deploy the template] ***** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ClientError: An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to update stack algo: An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. An error occurred (ValidationError) when calling the UpdateStack operation: Stack:arn:aws:cloudformation:ap-northeast-1:821326127294:stack/algo/33567100-a3b2-11e9-a2bf-063d35edfb9e is in ROLLBACK_COMPLETE state and can not be updated. - <class 'botocore.exceptions.ClientError'>"} included: /usr/local/src/algo/playbooks/rescue.yml for localhost
TASK [debug] *** ok: [localhost] => { "fail_hint": [ "Sorry, but something went wrong!", "Please check the troubleshooting guide.", "https://trailofbits.github.io/algo/troubleshooting.html" ] }
TASK [Fail the installation] *** fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed as requested from task"}
PLAY RECAP ***** localhost : ok=32 changed=1 unreachable=0 failed=2