spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
137 stars 63 forks source link

Hostname verification failure when connecting via VPC Endpoint #123

Open mac-macoy opened 1 year ago

mac-macoy commented 1 year ago

We connect from Spark to Redshift across VPCs using VPC Endpoints. When trying to connect with (in Databricks):

(
  spark.read
    .format("redshift")
    .option("url", "jdbc:redshift://VPCE_DNS_URL:PORT/DB")
    ...
)

we get the error:

java.sql.SQLException: The hostname VPCE_DNS_URL could not be verified by hostnameverifier RedshiftjdbcHostnameVerifier.

When we use the JDBC connector (.format("jdbc")) using the same URL, it connects and returns results.

We found that if we add ;sslmode=verify-ca to the URL, the Redshift connector works:

(
  spark.read
    .format("redshift")
    .option("url", "jdbc:redshift://VPCE_DNS_URL:PORT/DB;sslmode=verify-ca")
    ...
)
jsleight commented 1 year ago

Thanks for the report. Seems like you have a workaround as well.

I'm not sure if this is something we would want to add into the upstream or not. @kunaldeepsingh do you have intuition on that?

aravishdatabricks commented 1 year ago

@jsleight
Please at least add a hint in the error when this message is encountered java.sql.SQLException: The hostname xxxxxxx could not be verified by hostnameverifier RedshiftjdbcHostnameVerifier.