Closed jansohn closed 5 years ago
Hi @jansohn
We have removed this piece of code in PR #839 and the fix was part of v7.1.2-preview. Would like you to test and confirm if the new driver version resolves your problem?
Thanks, Cheena
I just tested it with 7.1.2.jre8-preview
and 7.1.3.jre8-preview
. The problem persists, there are no notable differences.
Hi @jansohn
Can you confirm if you see the exact same error as in above stacktrace or some other by using newer version of the driver? Please make sure your application doesn't have multiple driver versions referenced as according to the stacktrace, the code path in question (Reflective access to Config.getKDCList()) no longer exists as I mentioned in the PR link above.
Please verify in a standalone Java App and do share us repro it you see the issue.
I'll check this next week!
Updated stacktrace with 7.1.2.jre8-preview
:
Daemon Thread [localhost-startStop-1] (Class load: ResolverConfigurationImpl)
owns: Object (id=93)
owns: Object (id=94)
owns: EntityManagerFactoryDelegate (id=95)
owns: ApplicationContextListener (id=96)
owns: StandardContext (id=97)
ResolverConfiguration.open() line: 56 [local variables unavailable]
DnsContextFactory.serversForUrls(DnsUrl[]) line: 149
DnsContextFactory.getContext(String, DnsUrl[], Hashtable<?,?>) line: 81
DnsContextFactory.urlToContext(String, Hashtable<?,?>) line: 120
DnsContextFactory.getInitialContext(Hashtable<?,?>) line: 64
NamingManager.getInitialContext(Hashtable<?,?>) line: 684
InitialDirContext(InitialContext).getDefaultInitCtx() line: 313
InitialDirContext(InitialContext).init(Hashtable<?,?>) line: 244
InitialDirContext(InitialContext).<init>(Hashtable<?,?>) line: 216
InitialDirContext.<init>(Hashtable<?,?>) line: 101
DNSUtilities.findSrvRecords(String) line: 43
DNSKerberosLocator.isRealmValid(String) line: 38
KerbAuthentication$2.isRealmValid(String) line: 293
KerbAuthentication.findRealmFromHostname(KerbAuthentication$RealmValidator, String) line: 321
KerbAuthentication.enrichSpnWithRealm(String, boolean) line: 255
KerbAuthentication.<init>(SQLServerConnection, String, int) line: 231
SQLServerConnection.logon(SQLServerConnection$LogonCommand) line: 3557
SQLServerConnection.access$000(SQLServerConnection, SQLServerConnection$LogonCommand) line: 81
SQLServerConnection$LogonCommand.doExecute() line: 3541
SQLServerConnection$LogonCommand(TDSCommand).execute(TDSWriter, TDSReader) line: 7233
SQLServerConnection.executeCommand(TDSCommand) line: 2869
SQLServerConnection.connectHelper(ServerPortPlaceHolder, int, int, boolean, boolean, boolean, int) line: 2395
SQLServerConnection.login(String, String, int, String, FailoverInfo, int, long) line: 2042
SQLServerConnection.connectInternal(Properties, SQLServerPooledConnection) line: 1889
SQLServerConnection.connect(Properties, SQLServerPooledConnection) line: 1120
SQLServerDriver.connect(String, Properties) line: 700
DriverManager.getConnection(String, Properties, Class<?>) line: 664
DriverManager.getConnection(String, Properties) line: 208
DefaultConnector.connect(Properties, Session) line: 98
DatabaseLogin(DatasourceLogin).connectToDatasource(Accessor, Session) line: 162
ServerSession(DatabaseSessionImpl).setOrDetectDatasource(boolean) line: 207
ServerSession(DatabaseSessionImpl).loginAndDetectDatasource() line: 760
EntityManagerFactoryProvider.login(DatabaseSessionImpl, Map, boolean) line: 265
EntityManagerSetupImpl.deploy(ClassLoader, Map) line: 731
EntityManagerFactoryDelegate.getAbstractSession() line: 205
EntityManagerFactoryDelegate.createEntityManagerImpl(Map, SynchronizationType) line: 305
EntityManagerFactoryImpl.createEntityManagerImpl(Map, SynchronizationType) line: 337
EntityManagerFactoryImpl.createEntityManager() line: 303
ApplicationContextListener.createEntityManager() line: 229
[...]
Reproducer: https://github.com/jansohn/kerberos-timer
Hi @jansohn, thanks for your investigation. I can also see that a daemon thread has been left behind with your reproduction code. We'll start looking into this issue, and will let you know when we have an update.
Hi @jansohn,
That particular piece of code was introduced in PR #40, to automatically append the realm name to the SPN for Kerberos cross-realm authentication (when server's realm is different than the default realm). It makes use of the JNDI DNS, which spins up a daemon thread in the background, but does not provide an API to release it. I agree that leaving a thread running is an issue and did some research to find alternatives to JNDI DNS, but looks like there is no other JDK library that would let us achieve the same functionality. There are however some third party libraries that provide API to retrieve DNS records, but it is highly unlikely that we will introduce a new dependency to the driver, mainly because there is already a connection property serverSpn
that lets the users specify realm name for the scenario where the default realm in the Kerberos configuration is different than the realm of the server.
All of that being said, we are currently leaning toward reverting PR #40 and asking users to explicitly specify realm name in serverSpn
property. We are also definitely open to suggestions that would let us keep the functionality introduced in PR #40 and also fix the issue with a thread being left behind without introducing a new dependency to the driver.
I am adding @pierresouchay - the author of the PR #40 to the thread to get his opinion on this issue too.
Hello @ulvii and @jansohn,
I was unaware of such Thread being initialized when using native DNS resolutions.
I have a few remarks:
org.xbill.DNS.*
for things such as having access to TCP DNS when needed. Such library has very low dependencies (aka NONE) and would be able to do the job.I really think removing this feature would be a shame: Kerberos is complicated (look at the workaround I had to do to make it work on multiple JVMs) and I spent lots of time on fixing it, and removing the feature would break some users code.
Hi @pierresouchay,
Thank you for the detailed explanation. You are correct to say that the thread gets reused when reloading the driver, so this does not seem like a memory leak, but the ideal behavior would be to not have the thread left behind. Another reason why we were thinking of reverting the PR was because there are lots of workarounds in the code as you mentioned and it is not straightforward (sometimes nearly impossible) to maintain them, given the limitations and changes from different JDK vendors/versions.
@jansohn, do you have any particular issue in your application that's caused by the daemon thread or it is just a warning reported in Tomcat?
@ulvii The code is complicated for some reasons: JVM support for Kerberos is not very consistent.
Some JVMs (SUN/Oracle) do parse the krb5.conf using the class sun.security.krb5.Config
, but some are parsing it wrong, especially on Windows where some versions of JVM do think there is one realm to rule them all (aka Active Directory), thus those JVMs do not resolve realms properly (I had to look at JRE source code to see that), thus, I did use feature testing to ensure the JVM is working accordingly ( https://github.com/Microsoft/mssql-jdbc/pull/40/files#diff-579fbf11bb1ece322ce26ed03202e7d1R357 ), so it is unlikely that this code breaks (it will fallback on DNS in that case).
DNS is cool to solve REALMs, unfortunately, it is not very often properly configured, thus, the 2nd method (using DNS) is less likely work since it expects lots of configuration to be in place. Moreover, the file krb5.conf
can contain some overrides regarding realms that might be very important in some cases. I did put it as a fallback for:
sun.security.krb5.Config
While you might choose to remove reflection to simplify code (in that case, users will have to configure all of their DNS records properly), but removing the feature is IMHO a huge mistake, because nobody understand Kerberos completely and having the logic in the driver is what make it work.
In my company, we are doing cross REALMs calls using it across 9 different REALMs with different levels of trust (mixing Hadoop and AD domains) for years, it had been impossible before that.
So, yes, this is not fun, but I think the driver HAS to hide the gory details, otherwise, what is the point?
@pierresouchay
Out of curiosity, can the connection property serverSPN not be set from your application? If not, could you please elaborate the limitations?
@cheenamalhotra yes it can, if you know what you are doing, but it is not trivial, especially in cross Realm calls: how you know that serverbak1.acme.com is cname to server.acme.com which is in realm UACME while you are in MY.ACME.NET.
How do you compute all of this? Well, you have to learn how it works. I bet than less than 0.1% of devs do, and less than 2% of admins
@pierresouchay I just checked and you are right that the thread is being re-used.
@ulvii We haven't put this to production yet. In development it has not caused any problems yet except the Tomcat warning.
My vote would go to introducing a lightweight dependency which takes care of the DNS resolution. We also need cross-realm Kerberos support so completely removing the feature would be pretty bad...
@jansohn, just to make sure we are on the same page, we would not remove Kerberos cross realm authentication support, the feature would still be supported with serverSPN property.
@ulvii the big issue is to determine it. That's fine for small shops with 2 databases, that's complicated with cross realms and subrealms, especially with modern discovery systems such as Consul
@pierresouchay, please correct me if I am wrong, doesn't realm name have to match the domain name of the server for PR #40 to work?
how you know that serverbak1.acme.com is cname to server.acme.com which is in realm UACME while you are in MY.ACME.NET.
@ulvii no: it can be overridden by the correct section in krb5.conf (oracle JVM with correct implementation) or with DNS records as specified in RFC. Realm taken into account will be the longest as well (aka: db.acme.corp will be used instead of acme.corp if two realms match)
Ex in our datacenters:
MyMachine.am5.hpc.acme.com
am5.hpc.acme.com is in fact in realm AMS. HPC.AD.ACME.COM (as specified with realms section of krb5.conf or DNS records, while machine.hpc.acme.com is in realm AD.AM5.ACME.COM)
Hi @pierresouchay
I just had a quick lookup into the implementation of krb5.config file and turns out the only thing different or additional in their implementation is looking up for DNS server record using TCP(_tcp) along-with UDP(_udp).
Do you think implementing similarly in DNSKerberosLocator.isRealmValid() after fetching records using UDP would resolve this problem?
Was this considered when the class was written in PR #40 and wondering if there were any limitations?
Hi @cheenamalhotra,
To be fair, I used (very few) shortcuts in the DNS implementation, because we do use Oracle JVM + Unix for most of our tools, and while I tested it on several Windows versions to make it work smoothly (for instance with DBeaver GUI), the advanced stuff work better with Oracle JVMs in complicated setups (we have Hadoop + AD)
So the DNS implementation lacks a few stuff to be closer to reflection based implementation:
no way to link a given REALM to a DNS suffix (it can be achieved using DNS TXT record see https://web.mit.edu/kerberos/krb5-latest/doc/admin/realm_config.html in MIT documentation, for instance to bind mydomain.com to realm ACME. #40 supports it ONLY with krb5.conf
properly configured with REALM section for Oracle JVM. Implementing this in pure DNS is not complicated, but on our side, only krb5.conf
was done, so I did not implement DNS TXT records. Note this is a problem only with complicated setups since most Active Directory installations do enforce domain.com to match realm DOMAIN.COM (but it is possible with migrations AND/OR complicated setups), but still it works with 95% of installations by default (I doubt many users do run MS SQL Server in a MIT or Heimdal KDC). On our side, it was useful, so we did use those declarations to make it work properly: https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#domain-realm
About _tcp VS _udp, yes, you are right. Except that Kerberos always try first udp by default (unless option udp_preference_limit
is set up in krb5.conf
file or if size of payload is too big for MTU), thus all working Kerberos implementation always declare _udp
(and sometimes _tcp
, but less likely since not the first option). So in terms of pure RFC compliance, yes, both should be be tried in theory, but trying both is also slower, thus I took the path to use only the first one - which is properly declared by AD DNS by default.
What I know is that those implementations are well tested with several Windows versions (2008, 2012, 2016, 7, 10), Mac OS (which is using Heimdal) and Linux and several JVMs (6, 7, 8 at the time - hence the test with this.might.not.exist.
to make it work on most Oracle JVMs on Windows). But it is true, it is possible to have a pure DNS implementation behaving nicely in most cases (the current does, except for very complicated setups). I'll be happy to help for a review if you plan to implement this, but as I said, this is not trivial for end users to make it work in a real world scenario and deciding that the user will specify the SPN properly is a good way to ensure people will not be using Kerberos.
Hi @pierresouchay
Thanks for clarifying. I'd like to summarize the discussion to conclude few things, based on our discussions:
krb5.conf
with Oracle JVM and driver's implementation of DNSKerberosLocator.isRealmValid()
would suffice your case?krb5.config.java::getKDCList()
, we would need to implement additional DNS TXT record scanning and _tcp
(optionally) which are although a very edge case scenarios but since they were supported at one point, adding them back would be ideal (may not even impact 95% of the audience). But for most of the audience, ServerSPN detection would work normally as before. The impacted audience can always provide serverSPN
connection property without any problems.Lastly, regarding the open thread issue, it seems like this is deep embedded in Krb5Context
implementation, and not only us, any Kerberos Authentication event performing handshake shall create this Daemon thread if krb5.conf is used, and whether or not serverSPN
is specified with Realm information, that wouldn't solve the problem. If any third party libraries are recommended in this case, we'd like to examine them, but looks unlikely to me so far.
@cheenamalhotra Yes, this is well summarized.
I agree that DNSKerberosLocator is probably sufficient for most use-cases as it is. While we had in the past the need for REALM -> domains configuration, I think we did trash that - so, at least for my company - it would not be an issue, and since very few people do connect to MS SQL Servers outside realms managed by AD, very few would have the issue. In that case, it is also correct to say that setting the SPN as a workaround (which is the way we took initially before #40 was merge by your team ) is working.
It is true that the implementation of DNSKerberosLocator might be enhanced (_tcp
+ DNS TXT records for realms), even if I doubt many people would benefit from it.
And finally, if some Kerberos internals do create daemon thread (very likely since it might itself perform some InetLookups with the same interface I used (very used in many many old school Java libraries and JDK internals), as suggested in my first comment https://github.com/Microsoft/mssql-jdbc/issues/918#issuecomment-457755973 :
We are using this Driver (with another Driver on top of it) for quite a long time (before #40 got merged since this code is basically code I wrote before this PR) on more than 50 intensive applications performing lots of cross REALM calls, and we never had any issue regarding a leak or a failure initializing Kerberos by using simply IPs or CNAMEs for target databases.
Regards
Thanks @pierresouchay for your explainations.
@jansohn
To conclude, the RUNNING thread ResolverConfigurationImpl.AddressChangeListener
is something Java creates on purpose. It's triggered through multiple paths, some paths were identified in PR #40 but it also gets triggered on Kerberos handshake event of initSecContext
hence it's always going to exist as long as you use Kerberos in Java. But there's always an assurance of single instance of the thread due to it's static
nature.
Also I believe this thread is created for caching purpose looking at its source code. As the name suggests, it's an "Address Change Listener", which awaits change in DNS Address entries by calling Native JNI code, and has a role to play when request is received to load DNS configuration from OS. If nothing has changed on OS and available DNS data has not expired, Java reuses available DNS information for next request but if address change notification has received, it would always load latest DNS configuration from OS.
On that note, I will close the issue. If you have more concerns, you may reach out to OpenJDK Team, but according to my understanding, this daemon thread should not cause any harm to your application.
Driver version
7.0.0
SQL Server version
Client Operating System
Windows 10
JAVA/JVM version
Problem description
Expected behaviour: All threads should be stopped when shutting down the application in Tomcat
Actual behaviour: One thread is still running after shutdown and produces a possible memory leak
Error message/stack trace:
I debugged the web application and found out that the thread is started in a static initialization block in
sun.net.dns.ResolverConfigurationImpl
(see stack trace below):The thread itself is started by a Java core class but my question is if we can prevent the thread from starting in the first place (stopping it correctly seems to be out of the question as there is no reference available). I think others using Kerberos for authentication (with username/password) should experience this problem, too. It would be nice to know if this is a general problem or if I'm doing something wrong. Currently the code is embedded in a bigger web application (with JPA and others) but I'll try to strip it down to a minimal example if necessary.