itsumma / spark-greenplum-connector

ITSumma Spark Greenplum Connector
MIT License
34 stars 10 forks source link

The show method for the query data is stuck #14

Closed zhousPole closed 5 months ago

zhousPole commented 5 months ago

use example:

scala> val gpdf = spark.read.format("its-greenplum").
    option("url", "jdbc:postgresql://gp-master-host:5432/database").
    option("user", "database_user").
    option( "password", "yourpassword").
    option("dbtable","source_table_name").load()
scala> gpdf.show()

gpdf.printSchema() It can be printed normally use windows10 PowerShell: image

zhousPole commented 5 months ago

use spark-greenplum-connector_2.12-3.1.jar

zhousPole commented 5 months ago

I have found the problem(SparkSchemaUtil.scala guessMaxParallelTasks method): image sparkContext.getExecutorMemoryStatus.keys.size - 1 always 0,So there's a dead loop Here is the version I modified the guessMaxParallelTasks method:

  def guessMaxParallelTasks(): Int = {
    val sparkContext = SparkContext.getOrCreate
    var guess: Int = -1
    val osName = System.getProperty("os.name")
    var isLocal: Boolean = false
    if (osName.toLowerCase().contains("windows") || osName.toLowerCase().contains("mac")) {
      isLocal = true
    }
    if (isLocal) {
      guess = sparkContext.getConf.getInt("spark.default.parallelism", 1) - 1;
    } else {
      while ((guess <= 0) && !Thread.currentThread().isInterrupted) {
        guess = sparkContext.getExecutorMemoryStatus.keys.size - 1
        if (sparkContext.deployMode == "cluster")
          guess -= 1
      }
    }
    guess
  }