microsoft / mssql-docker

Official Microsoft repository for SQL Server in Docker resources
MIT License
1.74k stars 760 forks source link

Issue with mssql docker linux container AD #808

Open killmasta93 opened 1 year ago

killmasta93 commented 1 year ago

Hi I was wondering if someone else has had this issue before, Currently trying to authenticate with LDAP using How to Enable AD Authentication for SQL 2019 Containers in Less than 5 Minutes | Data Exposed - YouTube as a guide with Configure Active Directory authentication with SQL Server on Linux-based containers using adutil - SQL Server | Microsoft Learn

the only thing different i changed is the docker compose file which im not sure if thats the issue, I was checking the logs and couldn’t find information of the issue the only issue im having is when i try to create the user

the ad server is 192.168.3.80

create login [domain\administrator] From Windows

Msg 15401, Level 16, State 1, Line 1
Windows NT user or group 'domain\administrator' not found. Check the name again.
version: '3.7'
services:
      mssql:
        image: mcr.microsoft.com/mssql/server:2019-latest
        ports:
          - "1433:1433"
        env_file:
          - sqlserver.env
          - sapassword.env
        volumes:
          - /scsi3/sqlsystem:/var/opt/mssql/
          - /scsi3/sqldata:/var/opt/sqlserver/data
          - /scsi3/sqllog:/var/opt/sqlserver/log
          - /scsi3/sqlbackup:/var/opt/sqlserver/backup
          - /scsi3/krb5.conf:/etc/krb5.conf
        dns:
            - 192.168.0.80
        extra_hosts:
            - 'apolo.domain.local:192.168.3.80'
            - 'domain.local:192.168.3.80'

volumes:
  sqlsystem:
  sqldata:
  sqllog:
  sqlbackup:

when i ssh into the container and ping apolo.domain.local it resolves correctly not sure what else i missed?

tellierflexus commented 1 year ago

I have exactly the same issue , is there a way to check if the kerberos config is taken in account ? (/etc/krb5.conf)

killmasta93 commented 1 year ago

@tellierflexus i tried many times cant seem to get it working not sure if you were able to get it working?

tellierflexus commented 1 year ago

Nop, doesn't work for me... Do you have an idea of how to check if the kerberos link works ?

killmasta93 commented 11 months ago

i got somewhere with this but still trying to figure out

root@hercules:/# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: administrator@TEST.LOCAL

Valid starting     Expires            Service principal
12/26/23 03:54:34  12/26/23 13:54:34  krbtgt/TEST.LOCAL@TEST.LOCAL
    renew until 12/27/23 03:54:32
12/26/23 03:54:37  12/26/23 13:54:34  MSSQLSvc/hercules.TEST.LOCAL:1433@TEST.LOCAL
    renew until 12/27/23 03:54:32
root@hercules:/# /opt/mssql/bin/mssql-conf validate-ad-config /var/opt/mssql/secrets/mssql.keytab
Error: Cannot contact default realm 'TEST'
Warning: RDNS could not resolve this host. This is not a fatal error, but should be fixed by updating your RDNS records
Traceback (most recent call last):
  File "/opt/mssql/bin/../lib/mssql-conf/mssql-conf.py", line 598, in <module>
    main()
  File "/opt/mssql/bin/../lib/mssql-conf/mssql-conf.py", line 594, in main
    processCommands()
  File "/opt/mssql/bin/../lib/mssql-conf/mssql-conf.py", line 310, in processCommands
    COMMAND_TABLE[args.which]()
  File "/opt/mssql/bin/../lib/mssql-conf/mssql-conf.py", line 266, in handleValidateADConfig
    config.validate()
  File "/opt/mssql/lib/mssql-conf/mssqlad.py", line 149, in validate
    if self.checkSPNs() != successExitCode:
  File "/opt/mssql/lib/mssql-conf/mssqlad.py", line 376, in checkSPNs
    kvno, ret = self.checkKeytabEntry(spn)
  File "/opt/mssql/lib/mssql-conf/mssqlad.py", line 315, in checkKeytabEntry
    kvno = kvno.split('=')
TypeError: a bytes-like object is required, not 'str'
killmasta93 commented 10 months ago

@tellierflexus i got it working normally installing with https://www.cviorel.com/tag/mssql/ i then retried on the container debug it and got this

01/02/2024 01:41:24.395092415 Error [security.ldap] <0000000266/0x000003a4> Could not look up short domain name due to error: No address associated with hostname.
01/02/2024 01:41:24.395179302 Debug [security.kerberos] <0000000266/0x000003a4> SSPI operation 0x0000000D returned status: File: KerberosStream.cpp:1678 [Status: 0xC0000001 Operation unsuccessful]

and in the container it can resolve perfectly

saxn commented 4 months ago

We had the same issue for quite a while, turns out there is some kind of a problem with the newest Versions of SQL Server. Try it with 2022-CU13-ubuntu-20.04, this works for us, and also make sure your file permissions are set. Turns out file permissions will cause issues that show in the logs as dns issues or other kerberos related issues. I did spend a lot of time debugging this because we went down all the rabbit holes, dns issue, docker-compose, etc etc

neutmute commented 4 months ago

Any specific hints re: file permissions @saxn ? Have battled a long time to get SQL Server/AD on docker to work but stumped with PAL logs like

07/16/2024 05:40:43.000274516 Error [security.ldap] <0000000226/0x0000031c> Failed to bind to LDAP server ldap://ACME.DEV:3268: Local error
07/16/2024 05:40:43.000340731 Error [security.kerberos] <0000000226/0x0000031c> Error in AcceptSecurityContext: Failed to lookup identity of initiator

Infuriatingly, the sql instance will join the domain once and then after restart or failing to perform a magic rain dance, AD auth stops working

saxn commented 2 months ago

@neutmute I had the same log entries and got it to work with checking file permissions on the keytab file, the krb5.conf and the mssql.conf files. chown & chmod are your friends, try with all permissions and take them away, unfortunatly I can't tell you what exact permissions to use. Also we are working with a rootless docker install, so that complicated things a bit (you have to find the correct user id, it gets replaced as defined in /etc/subuid)

We ran into another problem when trying to update the version, we tried anything above 2022-CU13-ubuntu-20.04 without success, but then when we migrated back, it would not work anymore! We switched back to 2022-CU12-ubuntu-20.04 and updated again, then it once again started to work, but not able to reproduce it. I am trying to figure out what changes in the update process, we would like to migrate to a newer image version at some point.

saxn commented 2 months ago

@neutmute Again with an update after a couple of hours of testing. We identified the upgrade/downgrade of mssql as the source of the problem! Check out our testing protocol: grafik We started from the top and migrated to different versions. Note that we also switched to Ubuntu Version 22.04 - with not much changing. Red means it won't work, either because "FQDN not returned by RDNS lookup" or "Failed to bind to LDAP server" (Again no pattern) Green means it was working.

The problem is - we dont have a clue what the difference is. We suspect some kind of upgrade/downgrade script, but if you see any pattern, feel free to let us know :) We don't see one. Big version changes have a better chance of bricking from what we can tell. Would love to hear your findings.

neutmute commented 2 months ago

@saxn We got some consultants in and through their network got multiple advises that SQL Server Docker and AD just doesn't work. They had tried and failed several times to get it work repeatedly and reliably.

For the workload we intend, they suggested we shouldn't even be using docker. So I installed SQL Server 2019 onto Ubuntu 20.04 metal and AD just worked without any issue.

Next problem: The performance compared to Windows on same AWS instance size/EBS volume parameters sucks...

saxn commented 1 month ago

@neutmute I was finally able to solve the riddle! Turns out it was an issue with docker and docker compose and the hosts entries.

We defined the entries we needed but docker compose sorts them differently and not always in the same way, and if the order of the entries wasn't correct it would not work. I thought the order isn't important, but it is, obviously. In the end I solved it by combining the entries in the same line: extra_hosts:

neutmute commented 1 month ago

🤯 oh wow. So much black magic.