Cannot get winrm connection with a Win2016 machine member of an AD domain - Githubissues

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io

Other

43.09k stars 9.58k forks source link

Cannot get winrm connection with a Win2016 machine member of an AD domain #23495

Closed fckbo closed 3 years ago

fckbo commented 5 years ago

Terraform V0.11.11

When using terraform I cannot connect to a win 2016 server (VM on vSphere 6.5) that is a member of a domain managed by an Active Directory Server, in order to execute remotely powershell scripts on them.

My current assessment is that the connection provisionner does not work properly when using ntlm to connect to a remote Win2016 server.

Here is the template part related to the connection settings :

  connection {
        type = "winrm"
       user = "MYDOMAIN\\Administrator"
       password = "DOMAIN_ADMIN_PWD"
       host = "${var.ADM_IP}"  # IPv4 address of the virtual machine (=MYSERVER_IP)
        insecure = true
        https = false
        use_ntlm = true
        timeout  = "1m"
   }

I also have the same problem when using: user = "Administrator" password = "LOCAL_ADMIN_PWD"

I'm getting a 401, eventhough if I test Winrm commands from a Win2016 machine which is not a member of the domain it seems to work fine for exemple executing a remote 'dir' command using:

winrs -r:MYSERVER_IP -u:MYDOMAIN\Administrator -p:DOMAIN_ADMIN_PWD dir or winrs -r:MYSERVER_IP -u:Administrator -p:LOCAL_ADMIN_PWD dir

P.S1: I must say that prior to adding my server as a member of a domain, I could get terraform to execute remotely scripts on the server after provisioning without any problem using the following statements for the connection:

    connection {
       type = "winrm"
       user = "Administrator"
       password = "LOCAL_ADMIN_PWD"
       host = "${var.ADM_IP}"  # IPv4 address of the virtual machine
       use_ntlm = true
       timeout  = "1m"
   }

P.S.2: I can also get it to work when not using ntlm (ntlm=false) an using http but this is insecure ...

    connection {
      type = "winrm"
      user = "Administrator"
      password = "LOCAL_ADMIN_PWD"
      host = "${var.ADM_IP}"  # IPv4 address of the virtual machine
      port="5985"
      https = false
      insecure = true
      use_ntlm = false
      timeout  = "1m"
   }

Thx a lot for any help & hints on confirming that this is an issue or a change request or that there is a work-around.

danieldreier commented 5 years ago

@fckbo thanks for reporting this. We don't have a ton of in-depth Windows knowledge on the team, and so it's not immediately clear to me whether this is a defect or a configuration issue. You may be able to get better help on the discussion forum.

Also, have you reproduced this using a terraform 0.12 version? We're not shipping 0.11 versions anymore and so I'm hesitant to go back and reproduce this without knowing that it's happening in current versions.

mcascone commented 4 years ago

Also, have you reproduced this using a terraform 0.12 version? We're not shipping 0.11 versions anymore and so I'm hesitant to go back and reproduce this without knowing that it's happening in current versions.

I'm having this exact issue, using the Chef provisioner with a winrm connection.

A lot of folks are still blocked on upgrading to 0.12 due to the separate parallel-provisioning-not-working bug.

danieldreier commented 4 years ago

@mcascone thanks - that's very helpful

mcascone commented 4 years ago

I should clarify: I'm not on 0.12, i'm stuck on 0.11 due to the parallel provisioning issue.

I can say that on 0.11, and on my org's Win2012 vRA VMs, it all works well. It's on 0.11 and Win2016 that I am having this issue.

I'm using null_resource to do my chef provisioning so i can iterate on the code without waiting for a new VM every time. Also it allows the three machines to be spun in parallel, and then provisioned in parallel - my environment needs to know the name/ip of the new SQL server to configure the other two apps, so instead of waiting for the whole SQL installation, i create all the machines at once and can then provision them independently (since i'll know the SQL machine name after the VM comes online).

Note there are some different settings in the three null_resource blocks, none of them work:

null_resource.ICE-MASTER (chef): Connecting to remote host via WinRM...
null_resource.ICE-MASTER (chef):   Host: my-machine 
null_resource.ICE-MASTER (chef):   Port: 5986
null_resource.ICE-MASTER (chef):   User: my-user
null_resource.ICE-MASTER (chef):   Password: true
null_resource.ICE-MASTER (chef):   HTTPS: true
null_resource.ICE-MASTER (chef):   Insecure: true
null_resource.ICE-MASTER (chef):   NTLM: true
null_resource.ICE-MASTER (chef):   CACert: false
null_resource.ICE-REMOTE (chef): Connecting to remote host via WinRM...
null_resource.ICE-REMOTE (chef):   Host: my-machine2
null_resource.ICE-REMOTE (chef):   Port: 5985
null_resource.ICE-REMOTE (chef):   User: my-user
null_resource.ICE-REMOTE (chef):   Password: true
null_resource.ICE-REMOTE (chef):   HTTPS: false
null_resource.ICE-REMOTE (chef):   Insecure: true
null_resource.ICE-REMOTE (chef):   NTLM: false
null_resource.ICE-REMOTE (chef):   CACert: false
null_resource.ICE-SQL (chef): Connecting to remote host via WinRM...
null_resource.ICE-SQL (chef):   Host: my-machine 3
null_resource.ICE-SQL (chef):   Port: 5985
null_resource.ICE-SQL (chef):   User: my-account
null_resource.ICE-SQL (chef):   Password: true
null_resource.ICE-SQL (chef):   HTTPS: false
null_resource.ICE-SQL (chef):   Insecure: true
null_resource.ICE-SQL (chef):   NTLM: false
null_resource.ICE-SQL (chef):   CACert: false
null_resource.ICE-MASTER: Still creating... (20s elapsed)
null_resource.ICE-SQL: Still creating... (20s elapsed)
null_resource.ICE-REMOTE: Still creating... (20s elapsed)
Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...
stopping operation...

Error: Error applying plan:

3 errors occurred:
        * null_resource.ICE-SQL: interrupted - last error: http response error: 401 - invalid content type
        * null_resource.ICE-REMOTE: interrupted - last error: http response error: 401 - invalid content type
        * null_resource.ICE-MASTER: interrupted - last error: unknown error Post https://my-machine:5986/wsman: dial tcp my-ip-addr:5986: connectex: No connection could be made because the target machine actively refused it.

mcascone commented 4 years ago

@danieldreier , is there any update on this issue?

danieldreier commented 4 years ago

@mcascone no, we are open to a PR to fix this issue, but the team has been focused almost exclusively on two major features for the 0.13 release.

I'm curious, both for @mcascone and other folks who are experiencing this issue. Are you able to use the cloudinit provider to run your setup steps rather than rely on the winrm provisioner?

mcascone commented 4 years ago

Thanks for the quick response, @danieldreier. In my case, i'm trying to provision with Chef, which as far as I understand it, relies on winrm to do its thing. So cloudinit doesn't seem like an option, unless I'm missing something.

danieldreier commented 4 years ago

@mcascone the approach I'm imagining here would be to use cloudinit to kick off your chef run at provision time. The point of cloudinit is to kick off some kind of provisioning script the first time the VM comes up. I haven't done it for Chef, but I've certainly used CloudInit to install and kick off an initial Puppet run, so I imagine you could do the same for Chef, especially if you're using Chef Server rather than Solo. Does that make sense? That way you wouldn't have to winrm into the box at all at start time, you'd have the cloudinit script do whatever bootstrapping you need. I found a very old blog post describing this workflow. The syntax is out of date but the idea should be the same.

mcascone commented 4 years ago

Thanks @danieldreier, we're using on-prem vRA7 and as far as I can tell, cloud-init is not an option. I've started a conversation with the server team about how to provide a run_list or similar to the VMs when they start up, which i believe would accomplish the same thing as cloud-init and similar. However unless the output will be somehow ported over to the TF session, it'll just serve to make debugging the provisioning very very difficult. This was all working wonderfully last year, and i'd just like to get it back to where it was.

danieldreier commented 3 years ago

I want to apologize for the slow response time on this issue, and also let you know that I am bulk-closing all issues exclusively reported against Terraform 0.11.x, including this issue, because we are no longer investigating issues reported against Terraform 0.11.x. In most cases, when we try to reproduce issues reported against 0.11, we either can't reproduce them anymore, or the reporter has moved on, so we believe we can better support the Terraform user community by prioritizing more recent issues.

Terraform 0.12 has been available since May of 2019, and there are really significant benefits to adopting it. I know that migrating from 0.11 to versions past 0.12 can require a bit of effort, but it really is worth it, and the upgrade path is pretty well understood in the community by now. 0.14 is available and stable, and we are quickly approaching an 0.15 release.

We have made a significant effort in the last year to stay on top of bug reports; we have triaged almost all new bug reports within 1-2 weeks for 6+ months now. If you are still experiencing this problem, please submit a new bug report with a reproduction case that works on 0.14.x, link this old issue for context, and we will triage it.

ghost commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.