nabacg / krb5sqljdb

MS Sql Server JDBC driver wrapper to allow Spark/Yarn jobs make kerberized connections
MIT License
2 stars 6 forks source link

Calling driver #1

Open BIGabor opened 4 years ago

BIGabor commented 4 years ago

Hi Nabacg,

how can I call this driver? I want to use it with scoop, no luck. :( I tried to call like --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.cabworks.jdbc.Krb5SqlServer --connect "krb5ss://server.naem etc... But I'm getting a "java.sql.SQLException: No suitable driver found" error. I tried it with krb5sqlserver, same error. With sqlserver I'm getting the same kerberos failure.

Regards, Gabor

nabacg commented 4 years ago

Hi, Thanks for trying out my little project. I'm afraid I need to warn you that this is a sketch of a solution, not a fully fledged, plug a play library. It works, I've used this approach to solve many problems, but it makes a lot of assumptions on how it's used. Specifically this was made to be added to spark job run on a YARN cluster, it's going to need some tweaking before it can be used in any other context.

I suspect you don't have the standard SQL server driver on your classpath, that's why you're getting this error.

BIGabor commented 4 years ago

Hi, thank you for your fast response! I have the same problem with Yarn (no tgt found), just initial command is Sqoop. A list-tables works, as it is running on master with a valid TGT, but import sends jobs to nodes, and I get a no TGT error. If I understand correctly, your Driver overrides connect, and will get a ticket for keytab file/principal, and gives this connection back. My first question, what should I write to connection string? jdbc:krb5ss? This should come after --connect below: sqoop list-tables --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.cabworks.jdbc.Krb5SqlServer --connect "jdbc:krb5ss://sqlservername:1433..."

Is this ok, do you remember?

We already tried with JTDS driver, it works with NTLM, but we need Kerberos + datetime2 is incompatible as key column, si I have to use original JDBC. A driver with integrated kerberos TGT authentication would be fine.

I'm not familiar with Scala, just download sbt, compiled and packaged your project.

Many thanks for your help! Btw, I'm checking the classpath too, but Sqoop itself takes care for this as I now, and it's already in Sqoop's lib folder. But you calling it too, so checking.

Regards, Gabor

nabacg commented 4 years ago

Yup, you understand correctly. You do need to provide principal and keytab file in the url, best to check out those unit tests. But since it's not the easiest code to read, here is the example jdbc url you should use:

jdbc:krb5ss://serverName:1023;krb5Principal=testUser;krb5Keytab=testUser.keytab

You need to replace the bolded parts and keytab file needs to be a local path on the machine that will be making connection, so if this is distributed job then perhaps you need to add keytab file to the -files flag ( or equivalent).

But the error you're getting, I bet that's because original SQL Server driver is not available. This library does NOT package the driver, it assumes it's available to load. So you need to call Sqoop with both Krb5SqlServer AND SqlServer drivers, both need to be available on the classpath that Sqoop uses to run this connection. There are many ways to do it and I've never worked with Sqoop, but the usual way is to either have a cluster wide install, use a flag like -libjars in your Sqoop command or maybe add it to hadoop user home. Of course adding it to this project (build.sbt -> libraryDependencies) and uberjar'ing it is also a potential solution, although uberjars have their own problems..

BIGabor commented 4 years ago

I copied both jars to /usr/lib, and even to sqoop jars. With sqlserver, jtds it's working fine, just with jdbc:krb5ss not. One more question, maybe I didn't compiled it well. If I'm executing jar -vtf krb5sqljdbc_2.10-1.0.jar , I'm getting this list: 261 Tue May 12 07:28:12 CEST 2020 META-INF/MANIFEST.MF 0 Tue May 12 07:28:12 CEST 2020 net/ 0 Tue May 12 07:28:12 CEST 2020 net/cabworks/ 0 Tue May 12 07:28:12 CEST 2020 net/cabworks/jdbc/ 1772 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anonfun$toSqlServerUrl$1.class 1274 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anonfun$connectionProperties$2.class 2380 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anonfun$connectionProperties$3.class 5580 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer.class 1357 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anonfun$connectionProperties$1.class 767 Tue May 12 07:22:26 CEST 2020 Krb5SqlServerDriver.class 1890 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anonfun$toSqlServerUrl$2.class 3877 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$.class 1374 Tue May 12 07:22:26 CEST 2020 net/cabworks/jdbc/Krb5SqlServer$$anon$1.class After compile I'm getting warnings too: Class javax.servlet.jsp.JspWriter not found - continuing with a stub.. Is this ok?

Regards, Gabor

nabacg commented 4 years ago

Unfortunately the fact you copied it jars to /usr/lib doesn't tell me much. On which machine did you copy this? I assume you run this job on a YARN cluster, correct?

More importantly, does Sqoop add /usr/lib to the classpath it runs it's jobs on?

You should try something like this:

sqoop YOUR-CMD --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.cabworks.jdbc.Krb5SqlServer --connect "jdbc:krb5ss://serverName:1023;krb5Principal=testUser;krb5Keytab=testUser.keytab"
-libjars sqlServerDriver.jar,krb5sqljdbc_2.10-1.0.jar  -files testUser.keytab

I doubt the problem is with the jar you compiled, you'd be getting class not found exceptions asking about "net.cabworks.jdbc.Krb5SqlServer " class.

BIGabor commented 4 years ago

Hi,

I called this exactly on same way, no luck. When I included the sqljdbc.jar in compile, I get a JDK 1.8 unsupported when calling the driver (but I had to use jdbc:sqlserver, jdbc:krb5ss doesn't worked). I replaced to latest from MS JDBC, and with sqlserver it's working now (at least no more "no suitiable driver found" and connectionstring messages. It means, Sqoop uses the Sqoop connector to connect, and not the given driver. Now I found a solution to give a connector in $SQOOP_HOME/conf/managers.d, or in sqoop-site.xml . I will try this, and come back with results.

Regards, Gabor

BIGabor commented 4 years ago

A littlebit reverse engineering on Sqoop JDBCDrivers class: JdbcDrivers { MYSQL("com.mysql.jdbc.Driver", "jdbc:mysql:"), POSTGRES("org.postgresql.Driver", "jdbc:postgresql:"), HSQLDB("org.hsqldb.jdbcDriver","jdbc:hsqldb:"), ORACLE("oracle.jdbc.OracleDriver","jdbc:oracle:"), SQLSERVER("com.microsoft.sqlserver.jdbc.SQLServerDriver", "jdbc:sqlserver:"), JTDS_SQLSERVER("net.sourceforge.jtds.jdbc.Driver", "jdbc:jtds:sqlserver:"), DB2("com.ibm.db2.jcc.DB2Driver", "jdbc:db2:"), NETEZZA("org.netezza.Driver", "jdbc:netezza:"), CUBRID("cubrid.jdbc.driver.CUBRIDDriver", "jdbc:cubrid:");

That's why sqoop doesn't accepts jdbc:krb5ss://, so I have to write sqlserver. But it means, driver may doesn't make sense, it will use jdbc.sqlserverdriver, and not the specified Krb5SqlServerDriver. Checking in Sqoop sourcecode... The main problem is than Sqoop's connection manager :(. Please correct my if I think it's wrong.

BR, Gabor

nabacg commented 4 years ago

This code was designed to be used with Spark, plus I don't know much about about Sqoop overall, so I'm afraid I can't really help you here. If Sqoop has a hardcoded list of classes and jdbc prefixes, then you might be out of luck. Isn't there a way to register custom JDBC drivers ?

BIGabor commented 4 years ago

Hi Nabacg,

it looks it's s JDK 1.8 issue. This driver is no longer compatible as I see: https://github.com/JetBrains/jdk8u_jdk/blob/94318f9185757cc33d2b8d527d36be26ac6b7582/src/share/classes/java/sql/DriverManager.java

*

The DriverManager methods getConnection and

Already placed META-INF/services/java.sql.Driver with content: net.cabworks.jdbc.Krb5SqlServer but no luck, after this I set override def acceptsURL(url: String): Boolean = true still no luck. Will dig in deeper on weekend.

Regards, Gabor

chandanbalu commented 3 years ago

@nabacg @BIGabor I have been following this thread and been working on code change as discussed.

I've got it working with Sqoop and Spark. I'll soon be sending out the link to my repo which will contain the code change and Sqoop commands. Thanks

BIGabor commented 3 years ago

Hi Chandanbalu,

cool, I will test it. Thank you!

BR, Gabor

chandanbalu commented 3 years ago

@BIGabor I need to un clutter my code and make it available on GitHub.

I had to read up this entire thread and dig deep into registering custom JDBC driver on JDK8 to get it working for Sqoop and Spark.

Glad to contribute and help the Hadoop community 😊

chandanbalu commented 3 years ago

@BIGabor I was able to get the code up into my GIT and added some more documentation about the problem with MS SQL JDBC driver.

You can catch up more details here

Let me know if you run into any issues or any feedback. Thanks