imperva / dsfkit

Imperva eDSF Kit is designed to automate the deployment of DSF
MIT License
7 stars 9 forks source link

Unexpected behavior during deployment #391

Closed 06212 closed 6 months ago

06212 commented 8 months ago

Hello @lindanasredin, Imperva,

Hope you are doing well.

I returning to you with ask for help for another issue.

During the terraform apply, the execution of the code continues a few hours. After the terraform deployment was manually interrupted it showed message that "readiness" script was not finished.

default.tfvars: `enable_dam = false agentless_gw_count = 0 agent_gw_count = 0 hub_hadr = false agentless_gw_hadr = false dra_version = "4.14"

aws_profile = "" aws_region_1 = "eu-west-1" aws_region_2 = "eu-west-1" subnet_ids = { hub_main_subnet_id = "subnet-xxxxxxxxxxxxxxx" hub_dr_subnet_id = "subnet-xxxxxxxxxxxxxxx" agentless_gw_main_subnet_id = "subnet-xxxxxxxxxxxxxxx" agentless_gw_dr_subnet_id = "subnet-xxxxxxxxxxxxxxx" mx_subnet_id = "subnet-xxxxxxxxxxxxxxx" agent_gw_subnet_id = "subnet-xxxxxxxxxxxxxxx" dra_admin_subnet_id = "subnet-xxxxxxxxxxxxxxx" dra_analytics_subnet_id = "subnet-xxxxxxxxxxxxxxx"`

Output: module.dra_analytics[0].aws_instance.dsf_base_instance: Still creating... [10s elapsed] module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec): Connecting to remote host via SSH... module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Host: 10...136 module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  User: ec2-user module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Password: false module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Private key: true module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Certificate: false module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  SSH Agent: false module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Checking Host Key: false module.hub_main[0].module.hub_instance.null_resource.readiness[0] (remote-exec):  Target Platform: unix module.dra_analytics[0].aws_instance.dsf_base_instance: Creation complete after 12s [id=i-09f15ea899ca22975] module.dra_analytics[0].null_resource.readiness: Creating... module.dra_analytics[0].null_resource.readiness: Provisioning with 'local-exec'... module.dra_analytics[0].null_resource.readiness (local-exec): Executing: ["/bin/bash" "-c" " while true; do\n response=$(curl -k -s -o /dev/null -w \"%{http_code}\" --request GET 'https://34.*.*.182:8443/mvc/login')\n if [ $response -eq 200 ]; then\n exit 0\n else\n sleep 60\n fi\n done"] module.dra_admin[0].aws_instance.dsf_base_instance: Creation complete after 13s [id=i-0992274acca3bffb3] module.dra_admin[0].null_resource.readiness: Creating... module.dra_admin[0].aws_eip_association.eip_assoc[0]: Creating... module.dra_admin[0].null_resource.readiness: Provisioning with 'local-exec'... module.dra_admin[0].null_resource.readiness (local-exec): Executing: ["/bin/bash" "-c" " while true; do\n response=$(curl -k -s -o /dev/null -w \"%{http_code}\" --request GET 'https://34.*.*.182:8443/mvc/login')\n if [ $response -eq 200 ]; then\n exit 0\n else\n sleep 60\n fi\n done"] module.dra_admin[0].aws_eip_association.eip_assoc[0]: Creation complete after 1s [id=eipassoc-010f2d21b951d0b5f] module.hub_main[0].module.hub_instance.aws_volume_attachment.ebs_att: Still creating... [20s elapsed] module.hub_main[0].module.hub_instance.null_resource.readiness[0]: Still creating... [20s elapsed] module.hub_main[0].module.hub_instance.aws_volume_attachment.ebs_att: Creation complete after 21s [id=vai-947258387]

FYI, after the terraform deployment was manually interrupted , it showed the following error: Error: local-exec provisioner error │ │ with module.dra_admin[0].null_resource.readiness, │ on .terraform/modules/dra_admin/main.tf line 75, in resource "null_resource" "readiness": │ 75: provisioner "local-exec" { │ │ Error running command ' while true; do │ response=$(curl -k -s -o /dev/null -w "%{http_code}" --request GET 'https://34.*.*.182:8443/mvc/login') │ if [ $response -eq 200 ]; then │ exit 0 │ else │ sleep 60 │ fi │ done': signal: interrupt. Output: ╵ ╷ │ Error: local-exec provisioner error │ │ with module.dra_analytics[0].null_resource.readiness, │ on .terraform/modules/dra_analytics/main.tf line 69, in resource "null_resource" "readiness": │ 69: provisioner "local-exec" { │ │ Error running command ' while true; do │ response=$(curl -k -s -o /dev/null -w "%{http_code}" --request GET 'https://34.*.*.182:8443/mvc/login') │ if [ $response -eq 200 ]; then │ exit 0 │ else │ sleep 60 │ fi │ done': signal: interrupt. Output: ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.hub_main[0].module.hub_instance.null_resource.readiness[0], │ on .terraform/modules/hub_main/_modules/aws/sonar-base-instance/userdata.tf line 58, in resource "null_resource" "readiness": │ 58: provisioner "remote-exec" { │ │ interrupted - last error: dial tcp 10...136:22: i/o timeout

Could you please take a look and advise what could cause the while loop to continue infinity ?

P.S. A few side questions.

  1. Does dskkit has the option to deploy DRA Analytics without Elastic IP address?
  2. I would like to access DRA Admin and DRA Analytics and run the readiness script locally for the pure test, which ssh user should be used in combination with ssh passwords stored in AWS Secret Manager?

For reference #381 #386

Thank you! Iliya

lindanasredin commented 8 months ago

Hi, It looks like you don't have SSH and HTTPS connectivity from the installer machine to the deployed environment, specifically DRA Admin, DRA Analytics (HTTPS) and DSF Hub (SSH). Are you providing your own security groups or letting the Terraform create them? I see they don't appear in the list of variables you provided, just making sure. In case you are providing your own security groups, please make sure they have these rules: https://github.com/imperva/dsfkit/tree/master/security_groups_samples If you are not providing your own security groups, please check for other network/security configurations that may be blocking the connectivity.

Regarding your questions:

  1. DRA Analytics doesn't have an Elastic IP. If you meant DRA Admin, the dra-admin module has this variable: attach_persistent_public_ip.
  2. The DRA readiness script tries to login to the DRA Admin as a way of checking that it is up. If you have access to the DRA Admin UI, you can try to login.
06212 commented 8 months ago

Hi @lindanasredin ,

You are right. I don't have https (8443) access from machine where terraform code was executed to the DRA Admin & Analytics.

That is why now I am trying to run the code from deployment machine in the same vpc. But still experience some issues related to network/Security Groups.

I will really appreciate if you can help with more information about how to manually login on DRA Analytics & DRA Admin machines. I can find ssh passwords in AWS secret manager , however for which users are those passwords ?

Thank you! Iliya

lindanasredin commented 8 months ago

There is an output variable called 'ssh_user' in the DRA admin and analytics modules. If you don't have it in your custom example, please add it.

06212 commented 8 months ago

Hi @lindanasredin ,

I am using example "dsf_single_account_deployment" from your repo.

Using I am able to login with username (from "ssh_user" in DRA Admin module) and password (from secret manager.) Thanks!!! But for some reason I am not able to login to DRA Analytics via ssh due to wrong username/password.

Do you have any suggestion why it should happen?

P.S. Is it possible to use ssh key generated in deployment machine under "ssh_keys" directory?

Thank you! Iliya

lindanasredin commented 8 months ago

It looks like we have an issue with ssh to the DRA Analytics. We are checking it and will update you. Indeed you need to use the ssh_key to ssh to any deployed machine, for example, in case of DRA Analytics, you can copy and paste the ssh command as is from this portion of the example output:

"analytics" = [
    {
      "archiver_user" = "archiver-user"
      "private_dns" = "......."
      "private_ip" = "......"
      "ssh_command" = "ssh -o UserKnownHostsFile=/dev/null -o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -i ssh_keys/dsf_ssh_key-default -W %h:%p cbadmin@.......' -i ssh_keys/dsf_ssh_key-default cbadmin@......."
    },
  ]

And then you will be prompted for the password.

hadar-timan commented 8 months ago

Hi Iliya, there is an issue with DRA version 4.14 that causes ssh access to the Analytics server using the ssh key to fail. However, in DRA version 4.15 this issue is resolved. Unfortunately, the current dsfkit version contains a recalled DRA 4.15 version. We will release a new dsfkit version with the correct DRA 4.15 version next Sunday, so currently you can use DRA version 4.13 (you can change the version by overriding the 'dra_version' variable)

06212 commented 7 months ago

Hi @hadar-timan ,

Thank you for provided information regarding the issue. I will give a try the workaround as using previous DRA version.