spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
135 stars 62 forks source link

sbt test seems failing on RedshiftSourceSuite.scala when no aws credentials provider is available #137

Closed smoy closed 12 months ago

smoy commented 1 year ago

Running sbt test, I experience failures on the RedshiftSourceSuite.scala when there is no default aws credentials provider.

I notice the stack trace reference the lack of AWS credentials. That makes sense since I am running without any of AWS creds. The original intend of this test suite is to Mock up Redshift jdbc https://github.com/spark-redshift-community/spark-redshift/blob/78fc297726683c7841365fdafb7ea24b744fef37/src/test/scala/io/github/spark_redshift_community/spark/redshift/RedshiftSourceSuite.scala#L179

So I assume some protocol has changed, such that the mock is not working any more. Unfortunately it's hard to test the prior version like 3a7ade8 because the old Redshift JDBC jar is no longer available. (makes sense if its has to do with protocol upgrades or security fixies).

[info] - Can load output of Redshift queries *** FAILED ***
[info]   com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@15c8ad5b: No AWS profile named 'default', com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@438e846a: Failed to connect to service endpoint: ]
[info]   at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:136)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.buildUnloadStmt(RedshiftRelation.scala:173)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:138)
[info]   at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$apply$4(DataSourceStrategy.scala:329)
[info]   at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.$anonfun$pruneFilterProject$1(DataSourceStrategy.scala:363)
[info]   at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:419)
[info]   at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:362)
[info]   at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:329)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
[info]   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[info]   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[info]   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
[info]   at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
[info]   at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)
[info]   at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)
[info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
[info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
[info]   at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)
[info]   at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)
[info]   at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
[info]   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
[info]   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
[info]   at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
[info]   at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70)
[info]   at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:476)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:162)
[info]   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[info]   at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
[info]   at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:162)
[info]   at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:155)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:175)
[info]   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
[info]   at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[info]   at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
[info]   at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:175)
[info]   at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
[info]   at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
[info]   at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
[info]   at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
[info]   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)
[info]   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3418)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftSourceSuite.$anonfun$new$3(RedshiftSourceSuite.scala:218)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftSourceSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(RedshiftSourceSuite.scala:49)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftSourceSuite.runTest(RedshiftSourceSuite.scala:49)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftSourceSuite.org$scalatest$BeforeAndAfterAll$$super$run(RedshiftSourceSuite.scala:49)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at io.github.spark_redshift_community.spark.redshift.RedshiftSourceSuite.run(RedshiftSourceSuite.scala:49)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[info]   at java.base/java.lang.Thread.run(Thread.java:829)
smoy commented 12 months ago

I am able to work around by setting fake values of

export AWS_ACCESS_KEY_ID=foo
export AWS_SECRET_ACCESS_KEY=bar
smoy commented 12 months ago

updated readme: https://github.com/spark-redshift-community/spark-redshift/pull/138/commits/ee60a14a40fac88d4714a325d754de7d9af032d5