mattermost / mattermost-load-test

[DEPRECATED] replaced by https://github.com/mattermost/mattermost-load-test-ng
Apache License 2.0
58 stars 43 forks source link

SSH timeout during ltops deploy #136

Open bigbitbus opened 5 years ago

bigbitbus commented 5 years ago

Hi

I am trying the setup the AWS terraform load testing rig.

I have successfully deployed the VMs and database (Terraform created them); but I cannot pass the stage when the app server is rebooted during the deploy time. Complains after 60s that VM is not available. I can ssh into the app server from the cli so its not a network/firewall issue.

Any way of making ltops waiting longer before giving up on ssh while the app server is booting?

I did grep this in the code

terraform/cluster_deploy.go:    case <-time.After(60 * time.Second):
terraform/cluster_deploy.go:            return errors.New("failed to reestablish ssh session after 60 seconds")

Would be nice to have a cli parameter for slow AWS reboot days perhaps?

Relevant logs:

ubuntu@ip-172-26-2-212:~/go/bin$ ./ltops deploy     --cluster cluster-name     --mattermost master     --license /home/ubuntu/mm/license.txt
INFO[0000] deploying to proxy0                           instance=ec2-54-208-15-187.compute-1.amazonaws.com
INFO[0000] resolved master to https://releases.mattermost.com/mattermost-platform/master/mattermost-enterprise-linux-amd64.tar.gz
INFO[0001] deploying to app0                             instance=ec2-100-26-193-107.compute-1.amazonaws.com
INFO[0042] successfully deployed to proxy0               instance=ec2-54-208-15-187.compute-1.amazonaws.com
ERRO[0099] unable to deploy to app0: failed to reboot app instance: failed to reestablish ssh session after 60 seconds  instance=ec2-100-26-193-107.compute-1.amazonaws.com
FATA[0099] Couldn't deploy load test cluster: failed to deploy 1 resources
ubuntu@ip-172-26-2-212:~/go/bin$ ./ltops ssh app --cluster cluster-name
INFO[0000] Connecting to app instance at ec2-100-26-193-107.compute-1.amazonaws.com
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-1052-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

161 packages can be updated.
79 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Fri Dec 14 22:38:13 2018 from 34.219.52.42
ubuntu@ip-172-31-85-240:~$

Detailed logs

ubuntu@ip-172-26-2-212:~/go/bin$ ./ltops deploy   --verbose  --cluster cluster-name     --mattermost master     --license /home/ubuntu/mm/license.
txt
INFO[0000] deploying to proxy0                           instance=ec2-54-208-15-187.compute-1.amazonaws.com
INFO[0000] resolved master to https://releases.mattermost.com/mattermost-platform/master/mattermost-enterprise-linux-amd64.tar.gz
DEBU[0000] loading https://releases.mattermost.com/mattermost-platform/master/mattermost-enterprise-linux-amd64.tar.gz
DEBU[0000] + sudo apt-get update                         instance=ec2-54-208-15-187.compute-1.amazonaws.com
INFO[0001] deploying to app0                             instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0002] + sudo apt-get install -y nginx               instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0002] uploading distribution                        instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0003] + sudo ln -fs /etc/nginx/sites-available/mattermost /etc/nginx/sites-enabled/mattermost  instance=ec2-54-208-15-187.compute-1.amazonaws
.com
DEBU[0003] + sudo rm -f /etc/nginx/sites-enabled/default  instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0003] + sudo grep -q -F 'worker_rlimit_nofile' /etc/nginx/nginx.conf || echo 'worker_rlimit_nofile 65536;' | sudo tee -a /etc/nginx/nginx.con
f  instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0003] + sudo sed -i 's/worker_connections.*/worker_connections 200000;/g' /etc/nginx/nginx.conf  instance=ec2-54-208-15-187.compute-1.amazona
ws.com
DEBU[0004] + sudo systemctl daemon-reload                instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0004] + sudo systemctl restart nginx                instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0004] + sudo systemctl enable nginx                 instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0004] + sudo shutdown -r now &                      instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0010] attempting to establish ssh session           instance=ec2-54-208-15-187.compute-1.amazonaws.com
DEBU[0010] + sudo rm -rf mattermost /opt/mattermost      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0011] + tar -xvzf /tmp/mattermost.tar.gz            instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0012] + sudo mv mattermost /opt                     instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0012] + mkdir -p /opt/mattermost/data               instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0013] + sudo apt-get update                         instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0014] + sudo apt-get install -y jq                  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0015] uploading license file...                     instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0015] uploading limits config...                    instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .ClusterSettings.ReadOnlyConfig  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .MetricsSettings.BlockProfileRate  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .ServiceSettings.EnableIncomingWehbooks  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .PluginSettings.Enable       instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .MetricsSettings.Enable      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .FileSettings.DriverName     instance=ec2-100-26-193-107.compute-1.amazonaws.com
..EBU[0016] updating config: .ServiceSettings.EnableIncomingWehbooks  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .PluginSettings.Enable       instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .MetricsSettings.Enable      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0016] updating config: .FileSettings.DriverName     instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .FileSettings.AmazonS3SecretAccessKey  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .TeamSettings.MaxUsersPerTeam  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .TeamSettings.EnableOpenServer  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .ServiceSettings.EnableLinkPreviews  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .ClusterSettings.ClusterName  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0017] updating config: .LogSettings.EnableDiagnostics  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .ServiceSettings.LicenseFileLocation  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .SqlSettings.DataSource      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .SqlSettings.MaxIdleConns    instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .ClusterSettings.Enable      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .FileSettings.AmazonS3Bucket  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0018] updating config: .FileSettings.AmazonS3Region  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .PluginSettings.EnableUploads  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .SqlSettings.DriverName      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .ServiceSettings.EnableOnlyAdminIntegrations  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .ServiceSettings.ListenAddress  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .ServiceSettings.SiteURL     instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0019] updating config: .ServiceSettings.EnableAPIv3  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] updating config: .ServiceSettings.EnableSecurityFixAlert  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] updating config: .SqlSettings.DataSourceReplicas  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] updating config: .SqlSettings.MaxOpenConns    instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] updating config: .FileSettings.AmazonS3AccessKeyId  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] updating config: .TeamSettings.MaxChannelsPerTeam  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0020] + sudo setcap cap_net_bind_service=+ep /opt/mattermost/bin/platform  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0021] + [[ -f /opt/mattermost/bin/mattermost ]] && sudo setcap cap_net_bind_service=+ep /opt/mattermost/bin/mattermost  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0021] + sudo systemctl daemon-reload                instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0021] + sudo systemctl restart mattermost.service   instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0023] + sudo systemctl enable mattermost.service    instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0023] + sudo shutdown -r now &                      instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0029] attempting to establish ssh session           instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0029] dial tcp 100.26.193.107:22: connect: connection refused  instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0034] attempting to establish ssh session           instance=ec2-100-26-193-107.compute-1.amazonaws.com
DEBU[0042] reestablished ssh session                     instance=ec2-54-208-15-187.compute-1.amazonaws.com
INFO[0042] successfully deployed to proxy0               instance=ec2-54-208-15-187.compute-1.amazonaws.com
ERRO[0084] unable to deploy to app0: failed to reboot app instance: failed to reestablish ssh session after 60 seconds  instance=ec2-100-26-193-107.compute-1.amazonaws.com
FATA[0084] Couldn't deploy load test cluster: failed to deploy 1 resources
ubuntu@ip-172-26-2-212:~/go/bin$         
bigbitbus commented 5 years ago

Update: new day new behavior! Today the exact same commands worked (no reboot/ssh time issue); definitely may be worth parameterizing that timeout given the unknown reboot times of public cloud providers.

ubuntu@ip-172-26-2-212:~/go/bin$ ltops deploy     --cluster mm-1     --mattermost master     --license /home/ubuntu/mm/license.txt
INFO[0000] resolved master to https://releases.mattermost.com/mattermost-platform/master/mattermost-enterprise-linux-amd64.tar.gz
INFO[0000] deploying to proxy0                           instance=ec2-3-83-65-247.compute-1.amazonaws.com
INFO[0027] deploying to app0                             instance=ec2-34-203-227-251.compute-1.amazonaws.com
INFO[0053] successfully deployed to proxy0               instance=ec2-3-83-65-247.compute-1.amazonaws.com
INFO[0096] successfully deployed to app0                 instance=ec2-34-203-227-251.compute-1.amazonaws.com