spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
136 stars 62 forks source link

Getting Failed to find data source: com.databricks.spark.redshif Spark 2.4.3 #17

Closed hitansu closed 5 years ago

hitansu commented 5 years ago

Spark version 2.4.3 running in EMR.

spark-shell --jars s3://x/spark-redshift/spark-redshift_2.11-4.0.0_community_edition.jar sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "corresponding accesskey") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","corresponding secretkey") *var df = spark.read.format("com.databricks.spark.redshift").option("url", "corresponding jdbc url").option("query", "select from employee").option("tempdir", "s3n://x-test/temp").load()**

Getting following exception

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) ... 49 elided Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)

lucagiovagnoli commented 5 years ago

The package has been renamed not to include databricks anymore. Try using: spark.read.format("com.spark_redshift_community.spark.redshift")

hitansu commented 5 years ago

Thanks. My bad forgot that.Its working.

lucagiovagnoli commented 5 years ago

We had to rename the package again. If you upgrade to the latest preview20190715, remember to use: spark.read.format("io.github.spark_redshift_community.spark.redshift")

I'm resolving this ticket now