Open Zethson opened 5 years ago
val dfs = for {
table <- tables
} yield (table, spark.read.jdbc(databaseProperties.jdbcURL, table, columnName="id", lowerBound=1L,
upperBound=100000L, numPartitions=3, connectionProperties))
This works, but only if diong a simple query on just a single table
Select * from Consequence
The issue is that not every column may have an ID.
https://stackoverflow.com/questions/56534189/jdbc-to-spark-dataframe-how-to-ensure-even-partitioning
Last comment about even partitioning may help.
Predicates if there are string columns to partition by
Maybe this will help fetching all primary keys?
https://dzone.com/articles/the-right-way-to-use-spark-and-jdbc
Not sure whether in the real network all executors are doing work or just a single one.
I shall investigate numPartitions. of jdbc.read
More info: https://stackoverflow.com/questions/41085238/what-is-the-meaning-of-partitioncolumn-lowerbound-upperbound-numpartitions-pa