microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.06k stars 428 forks source link

[BUG] Unstopped thread when using Kerberos authentication #918

Closed jansohn closed 5 years ago

jansohn commented 5 years ago

Driver version

7.0.0

SQL Server version

Microsoft SQL Server 2016 (SP2) (KB4052908) - 13.0.5026.0 (X64) 
    Mar 18 2018 09:11:49 
    Copyright (c) Microsoft Corporation
    Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 <X64> (Build 14393: ) (Hypervisor)

Client Operating System

Windows 10

JAVA/JVM version

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

Problem description

  1. Expected behaviour: All threads should be stopped when shutting down the application in Tomcat

  2. Actual behaviour: One thread is still running after shutdown and produces a possible memory leak

  3. Error message/stack trace:

    WARNING: The web application [test] appears to have started a thread named [Thread-3] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
     sun.net.dns.ResolverConfigurationImpl.notifyAddrChange0(Native Method)
     sun.net.dns.ResolverConfigurationImpl$AddressChangeListener.run(ResolverConfigurationImpl.java:144)
  4. I debugged the web application and found out that the thread is started in a static initialization block in sun.net.dns.ResolverConfigurationImpl (see stack trace below):

    static {
      AccessController.doPrivileged(new PrivilegedAction() {
         public Void run() {
            System.loadLibrary("net");
            return null;
         }
      });
      init0();
      ResolverConfigurationImpl.AddressChangeListener var0 = new ResolverConfigurationImpl.AddressChangeListener();
      var0.setDaemon(true);
      var0.start();
    }
    Daemon Thread [localhost-startStop-2] (Class load: ResolverConfigurationImpl)   
        owns: Object  (id=91)   
        owns: Object  (id=92)   
        owns: EntityManagerFactoryDelegate  (id=93) 
        owns: ApplicationContextListener  (id=94)   
        owns: StandardContext  (id=95)  
        ResolverConfiguration.open() line: 56 [local variables unavailable] 
        DnsContextFactory.serversForUrls(DnsUrl[]) line: 149    
        DnsContextFactory.getContext(String, DnsUrl[], Hashtable<?,?>) line: 81 
        dnsURLContext.getRootURLContext(String, Hashtable<?,?>) line: 71    
        dnsURLContext(GenericURLDirContext).getAttributes(String, String[]) line: 100   
        KrbServiceLocator.lambda$getKerberosService$1(Context, String) line: 168    
        1747531283.run() line: not available    
        AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext) line: not available [native method]   
        AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext, Permission...) line: 713  
        KrbServiceLocator.getKerberosService(String, String) line: 166  
        Config.getKDCFromDNS(String) line: 1171 
        Config.getKDCList(String) line: 1057    
        NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]  
        NativeMethodAccessorImpl.invoke(Object, Object[]) line: 62  
        DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
        Method.invoke(Object, Object...) line: 498  
        KerbAuthentication$2.isRealmValid(String) line: 301 
        KerbAuthentication.getRealmValidator(String) line: 312  
        KerbAuthentication.enrichSpnWithRealm(String, boolean) line: 254    
        KerbAuthentication.<init>(SQLServerConnection, String, int) line: 231   
        SQLServerConnection.logon(SQLServerConnection$LogonCommand) line: 3557  
        SQLServerConnection.access$000(SQLServerConnection, SQLServerConnection$LogonCommand) line: 81  
        SQLServerConnection$LogonCommand.doExecute() line: 3541 
        SQLServerConnection$LogonCommand(TDSCommand).execute(TDSWriter, TDSReader) line: 7240   
        SQLServerConnection.executeCommand(TDSCommand) line: 2869   
        SQLServerConnection.connectHelper(ServerPortPlaceHolder, int, int, boolean, boolean, boolean, int) line: 2395   
        SQLServerConnection.login(String, String, int, String, FailoverInfo, int, long) line: 2042  
        SQLServerConnection.connectInternal(Properties, SQLServerPooledConnection) line: 1889   
        SQLServerConnection.connect(Properties, SQLServerPooledConnection) line: 1120   
        SQLServerDriver.connect(String, Properties) line: 700   
        DriverManager.getConnection(String, Properties, Class<?>) line: 664 
        DriverManager.getConnection(String, Properties) line: 208   
        DefaultConnector.connect(Properties, Session) line: 98  
        DatabaseLogin(DatasourceLogin).connectToDatasource(Accessor, Session) line: 162 
        ServerSession(DatabaseSessionImpl).setOrDetectDatasource(boolean) line: 207 
        ServerSession(DatabaseSessionImpl).loginAndDetectDatasource() line: 760 
        EntityManagerFactoryProvider.login(DatabaseSessionImpl, Map, boolean) line: 265 
        EntityManagerSetupImpl.deploy(ClassLoader, Map) line: 731   
        EntityManagerFactoryDelegate.getAbstractSession() line: 205 
        EntityManagerFactoryDelegate.createEntityManagerImpl(Map, SynchronizationType) line: 305    
        EntityManagerFactoryImpl.createEntityManagerImpl(Map, SynchronizationType) line: 337    
        EntityManagerFactoryImpl.createEntityManager() line: 303    
        ApplicationContextListener.createEntityManager() line: 229  
        [...]

The thread itself is started by a Java core class but my question is if we can prevent the thread from starting in the first place (stopping it correctly seems to be out of the question as there is no reference available). I think others using Kerberos for authentication (with username/password) should experience this problem, too. It would be nice to know if this is a general problem or if I'm doing something wrong. Currently the code is embedded in a bigger web application (with JPA and others) but I'll try to strip it down to a minimal example if necessary.

cheenamalhotra commented 5 years ago

Hi @jansohn

We have removed this piece of code in PR #839 and the fix was part of v7.1.2-preview. Would like you to test and confirm if the new driver version resolves your problem?

Thanks, Cheena

jansohn commented 5 years ago

I just tested it with 7.1.2.jre8-preview and 7.1.3.jre8-preview. The problem persists, there are no notable differences.

cheenamalhotra commented 5 years ago

Hi @jansohn

Can you confirm if you see the exact same error as in above stacktrace or some other by using newer version of the driver? Please make sure your application doesn't have multiple driver versions referenced as according to the stacktrace, the code path in question (Reflective access to Config.getKDCList()) no longer exists as I mentioned in the PR link above.

Please verify in a standalone Java App and do share us repro it you see the issue.

jansohn commented 5 years ago

I'll check this next week!

jansohn commented 5 years ago

Updated stacktrace with 7.1.2.jre8-preview:

Daemon Thread [localhost-startStop-1] (Class load: ResolverConfigurationImpl)   
    owns: Object  (id=93)   
    owns: Object  (id=94)   
    owns: EntityManagerFactoryDelegate  (id=95) 
    owns: ApplicationContextListener  (id=96)   
    owns: StandardContext  (id=97)  
    ResolverConfiguration.open() line: 56 [local variables unavailable] 
    DnsContextFactory.serversForUrls(DnsUrl[]) line: 149    
    DnsContextFactory.getContext(String, DnsUrl[], Hashtable<?,?>) line: 81 
    DnsContextFactory.urlToContext(String, Hashtable<?,?>) line: 120    
    DnsContextFactory.getInitialContext(Hashtable<?,?>) line: 64    
    NamingManager.getInitialContext(Hashtable<?,?>) line: 684   
    InitialDirContext(InitialContext).getDefaultInitCtx() line: 313 
    InitialDirContext(InitialContext).init(Hashtable<?,?>) line: 244    
    InitialDirContext(InitialContext).<init>(Hashtable<?,?>) line: 216  
    InitialDirContext.<init>(Hashtable<?,?>) line: 101  
    DNSUtilities.findSrvRecords(String) line: 43    
    DNSKerberosLocator.isRealmValid(String) line: 38    
    KerbAuthentication$2.isRealmValid(String) line: 293 
    KerbAuthentication.findRealmFromHostname(KerbAuthentication$RealmValidator, String) line: 321   
    KerbAuthentication.enrichSpnWithRealm(String, boolean) line: 255    
    KerbAuthentication.<init>(SQLServerConnection, String, int) line: 231   
    SQLServerConnection.logon(SQLServerConnection$LogonCommand) line: 3557  
    SQLServerConnection.access$000(SQLServerConnection, SQLServerConnection$LogonCommand) line: 81  
    SQLServerConnection$LogonCommand.doExecute() line: 3541 
    SQLServerConnection$LogonCommand(TDSCommand).execute(TDSWriter, TDSReader) line: 7233   
    SQLServerConnection.executeCommand(TDSCommand) line: 2869   
    SQLServerConnection.connectHelper(ServerPortPlaceHolder, int, int, boolean, boolean, boolean, int) line: 2395   
    SQLServerConnection.login(String, String, int, String, FailoverInfo, int, long) line: 2042  
    SQLServerConnection.connectInternal(Properties, SQLServerPooledConnection) line: 1889   
    SQLServerConnection.connect(Properties, SQLServerPooledConnection) line: 1120   
    SQLServerDriver.connect(String, Properties) line: 700   
    DriverManager.getConnection(String, Properties, Class<?>) line: 664 
    DriverManager.getConnection(String, Properties) line: 208   
    DefaultConnector.connect(Properties, Session) line: 98  
    DatabaseLogin(DatasourceLogin).connectToDatasource(Accessor, Session) line: 162 
    ServerSession(DatabaseSessionImpl).setOrDetectDatasource(boolean) line: 207 
    ServerSession(DatabaseSessionImpl).loginAndDetectDatasource() line: 760 
    EntityManagerFactoryProvider.login(DatabaseSessionImpl, Map, boolean) line: 265 
    EntityManagerSetupImpl.deploy(ClassLoader, Map) line: 731   
    EntityManagerFactoryDelegate.getAbstractSession() line: 205 
    EntityManagerFactoryDelegate.createEntityManagerImpl(Map, SynchronizationType) line: 305    
    EntityManagerFactoryImpl.createEntityManagerImpl(Map, SynchronizationType) line: 337    
    EntityManagerFactoryImpl.createEntityManager() line: 303    
    ApplicationContextListener.createEntityManager() line: 229  
    [...]
jansohn commented 5 years ago

Reproducer: https://github.com/jansohn/kerberos-timer

peterbae commented 5 years ago

Hi @jansohn, thanks for your investigation. I can also see that a daemon thread has been left behind with your reproduction code. We'll start looking into this issue, and will let you know when we have an update.

ulvii commented 5 years ago

Hi @jansohn,

That particular piece of code was introduced in PR #40, to automatically append the realm name to the SPN for Kerberos cross-realm authentication (when server's realm is different than the default realm). It makes use of the JNDI DNS, which spins up a daemon thread in the background, but does not provide an API to release it. I agree that leaving a thread running is an issue and did some research to find alternatives to JNDI DNS, but looks like there is no other JDK library that would let us achieve the same functionality. There are however some third party libraries that provide API to retrieve DNS records, but it is highly unlikely that we will introduce a new dependency to the driver, mainly because there is already a connection property serverSpn that lets the users specify realm name for the scenario where the default realm in the Kerberos configuration is different than the realm of the server.

All of that being said, we are currently leaning toward reverting PR #40 and asking users to explicitly specify realm name in serverSpn property. We are also definitely open to suggestions that would let us keep the functionality introduced in PR #40 and also fix the issue with a thread being left behind without introducing a new dependency to the driver.

I am adding @pierresouchay - the author of the PR #40 to the thread to get his opinion on this issue too.

pierresouchay commented 5 years ago

Hello @ulvii and @jansohn,

I was unaware of such Thread being initialized when using native DNS resolutions.

I have a few remarks:

I really think removing this feature would be a shame: Kerberos is complicated (look at the workaround I had to do to make it work on multiple JVMs) and I spent lots of time on fixing it, and removing the feature would break some users code.

ulvii commented 5 years ago

Hi @pierresouchay,

Thank you for the detailed explanation. You are correct to say that the thread gets reused when reloading the driver, so this does not seem like a memory leak, but the ideal behavior would be to not have the thread left behind. Another reason why we were thinking of reverting the PR was because there are lots of workarounds in the code as you mentioned and it is not straightforward (sometimes nearly impossible) to maintain them, given the limitations and changes from different JDK vendors/versions.

@jansohn, do you have any particular issue in your application that's caused by the daemon thread or it is just a warning reported in Tomcat?

pierresouchay commented 5 years ago

@ulvii The code is complicated for some reasons: JVM support for Kerberos is not very consistent.

Some JVMs (SUN/Oracle) do parse the krb5.conf using the class sun.security.krb5.Config, but some are parsing it wrong, especially on Windows where some versions of JVM do think there is one realm to rule them all (aka Active Directory), thus those JVMs do not resolve realms properly (I had to look at JRE source code to see that), thus, I did use feature testing to ensure the JVM is working accordingly ( https://github.com/Microsoft/mssql-jdbc/pull/40/files#diff-579fbf11bb1ece322ce26ed03202e7d1R357 ), so it is unlikely that this code breaks (it will fallback on DNS in that case).

DNS is cool to solve REALMs, unfortunately, it is not very often properly configured, thus, the 2nd method (using DNS) is less likely work since it expects lots of configuration to be in place. Moreover, the file krb5.conf can contain some overrides regarding realms that might be very important in some cases. I did put it as a fallback for:

While you might choose to remove reflection to simplify code (in that case, users will have to configure all of their DNS records properly), but removing the feature is IMHO a huge mistake, because nobody understand Kerberos completely and having the logic in the driver is what make it work.

In my company, we are doing cross REALMs calls using it across 9 different REALMs with different levels of trust (mixing Hadoop and AD domains) for years, it had been impossible before that.

So, yes, this is not fun, but I think the driver HAS to hide the gory details, otherwise, what is the point?

cheenamalhotra commented 5 years ago

@pierresouchay

Out of curiosity, can the connection property serverSPN not be set from your application? If not, could you please elaborate the limitations?

pierresouchay commented 5 years ago

@cheenamalhotra yes it can, if you know what you are doing, but it is not trivial, especially in cross Realm calls: how you know that serverbak1.acme.com is cname to server.acme.com which is in realm UACME while you are in MY.ACME.NET.

How do you compute all of this? Well, you have to learn how it works. I bet than less than 0.1% of devs do, and less than 2% of admins

jansohn commented 5 years ago

@pierresouchay I just checked and you are right that the thread is being re-used.

@ulvii We haven't put this to production yet. In development it has not caused any problems yet except the Tomcat warning.

My vote would go to introducing a lightweight dependency which takes care of the DNS resolution. We also need cross-realm Kerberos support so completely removing the feature would be pretty bad...

ulvii commented 5 years ago

@jansohn, just to make sure we are on the same page, we would not remove Kerberos cross realm authentication support, the feature would still be supported with serverSPN property.

pierresouchay commented 5 years ago

@ulvii the big issue is to determine it. That's fine for small shops with 2 databases, that's complicated with cross realms and subrealms, especially with modern discovery systems such as Consul

ulvii commented 5 years ago

@pierresouchay, please correct me if I am wrong, doesn't realm name have to match the domain name of the server for PR #40 to work?

how you know that serverbak1.acme.com is cname to server.acme.com which is in realm UACME while you are in MY.ACME.NET.

pierresouchay commented 5 years ago

@ulvii no: it can be overridden by the correct section in krb5.conf (oracle JVM with correct implementation) or with DNS records as specified in RFC. Realm taken into account will be the longest as well (aka: db.acme.corp will be used instead of acme.corp if two realms match)

Ex in our datacenters:

MyMachine.am5.hpc.acme.com

am5.hpc.acme.com is in fact in realm AMS. HPC.AD.ACME.COM (as specified with realms section of krb5.conf or DNS records, while machine.hpc.acme.com is in realm AD.AM5.ACME.COM)

cheenamalhotra commented 5 years ago

Hi @pierresouchay

I just had a quick lookup into the implementation of krb5.config file and turns out the only thing different or additional in their implementation is looking up for DNS server record using TCP(_tcp) along-with UDP(_udp).

Do you think implementing similarly in DNSKerberosLocator.isRealmValid() after fetching records using UDP would resolve this problem?

Was this considered when the class was written in PR #40 and wondering if there were any limitations?

pierresouchay commented 5 years ago

Hi @cheenamalhotra,

To be fair, I used (very few) shortcuts in the DNS implementation, because we do use Oracle JVM + Unix for most of our tools, and while I tested it on several Windows versions to make it work smoothly (for instance with DBeaver GUI), the advanced stuff work better with Oracle JVMs in complicated setups (we have Hadoop + AD)

So the DNS implementation lacks a few stuff to be closer to reflection based implementation:

What I know is that those implementations are well tested with several Windows versions (2008, 2012, 2016, 7, 10), Mac OS (which is using Heimdal) and Linux and several JVMs (6, 7, 8 at the time - hence the test with this.might.not.exist. to make it work on most Oracle JVMs on Windows). But it is true, it is possible to have a pure DNS implementation behaving nicely in most cases (the current does, except for very complicated setups). I'll be happy to help for a review if you plan to implement this, but as I said, this is not trivial for end users to make it work in a real world scenario and deciding that the user will specify the SPN properly is a good way to ensure people will not be using Kerberos.

cheenamalhotra commented 5 years ago

Hi @pierresouchay

Thanks for clarifying. I'd like to summarize the discussion to conclude few things, based on our discussions:

Lastly, regarding the open thread issue, it seems like this is deep embedded in Krb5Context implementation, and not only us, any Kerberos Authentication event performing handshake shall create this Daemon thread if krb5.conf is used, and whether or not serverSPN is specified with Realm information, that wouldn't solve the problem. If any third party libraries are recommended in this case, we'd like to examine them, but looks unlikely to me so far.

pierresouchay commented 5 years ago

@cheenamalhotra Yes, this is well summarized.

I agree that DNSKerberosLocator is probably sufficient for most use-cases as it is. While we had in the past the need for REALM -> domains configuration, I think we did trash that - so, at least for my company - it would not be an issue, and since very few people do connect to MS SQL Servers outside realms managed by AD, very few would have the issue. In that case, it is also correct to say that setting the SPN as a workaround (which is the way we took initially before #40 was merge by your team ) is working.

It is true that the implementation of DNSKerberosLocator might be enhanced (_tcp + DNS TXT records for realms), even if I doubt many people would benefit from it.

And finally, if some Kerberos internals do create daemon thread (very likely since it might itself perform some InetLookups with the same interface I used (very used in many many old school Java libraries and JDK internals), as suggested in my first comment https://github.com/Microsoft/mssql-jdbc/issues/918#issuecomment-457755973 :

We are using this Driver (with another Driver on top of it) for quite a long time (before #40 got merged since this code is basically code I wrote before this PR) on more than 50 intensive applications performing lots of cross REALM calls, and we never had any issue regarding a leak or a failure initializing Kerberos by using simply IPs or CNAMEs for target databases.

Regards

cheenamalhotra commented 5 years ago

Thanks @pierresouchay for your explainations.

@jansohn To conclude, the RUNNING thread ResolverConfigurationImpl.AddressChangeListener is something Java creates on purpose. It's triggered through multiple paths, some paths were identified in PR #40 but it also gets triggered on Kerberos handshake event of initSecContext hence it's always going to exist as long as you use Kerberos in Java. But there's always an assurance of single instance of the thread due to it's static nature.

Also I believe this thread is created for caching purpose looking at its source code. As the name suggests, it's an "Address Change Listener", which awaits change in DNS Address entries by calling Native JNI code, and has a role to play when request is received to load DNS configuration from OS. If nothing has changed on OS and available DNS data has not expired, Java reuses available DNS information for next request but if address change notification has received, it would always load latest DNS configuration from OS.

On that note, I will close the issue. If you have more concerns, you may reach out to OpenJDK Team, but according to my understanding, this daemon thread should not cause any harm to your application.