spark-jobserver / spark-jobserver

REST job server for Apache Spark
Other
2.84k stars 998 forks source link

Debug Driver and executor through passing spark.driver.extraJavaOptions in cluster #1290

Open AadhiKat opened 4 years ago

AadhiKat commented 4 years ago

Used Spark version -> 2.4.3

Used Spark Job Server version -> 0.9.0

Deployed mode -> Cluster Mode

local.conf

spark { master = "spark://xxx.xx.xx.xx:xxxx,yyy.yy.yy.yy:yyyy" submit.deployMode = "cluster" job-number-cpus = 4

jobserver { port = 9090 context-per-jvm = true ignore-akka-hostname=false jobdao = spark.jobserver.io.JobSqlDAO

filedao {
  rootdir = /home/sas/zdpas/job-server/filedao/data
}

datadao {
  rootdir = /home/sas/zdpas/job-server/upload
}

sqldao {
  slick-driver = slick.driver.PostgresDriver
  jdbc-driver = org.postgresql.Driver
  rootdir = /home/sas/zdpas/job-server/sqldao/data
  jdbc {
    url = "jdbc:postgresql://zzz.zz.zz.zz:zzzz/spark_jobserver"
    user = "jobserver"
    password = ""
  }
  dbcp {
    enabled = false
    maxactive = 20
    maxidle = 10
    initialsize = 10
  }
}
result-chunk-size = 1m

}

context-settings { passthrough { spark.executor.extraJavaOptions = "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n" spark.driver.extraJavaOptions = "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n"

es.nodes = "192.1.1.1"

}

} home = "/home/sas/zdpas/spark" }

akka { remote.netty.tcp { hostname="zzz.zz.zz.zz"

maximum-frame-size = 10 MiB

} }

flyway.locations="db/postgresql/migration"

shiro{ authentication = on config.path = /home/sas/zdpas/job-server/shiro.ini authentication-timeout = 10s }

spray.can.server { idle-timeout = 120s request-timeout = 100s parsing { max-content-length = 30m

max-uri-length = 8k

}

}

It seems to be listed in the spark properties , but it does not open any port to connect the debugger. I have specified the address too and have tried suspend=y. But nothing works. But it is listed in the properties

How to debug the spark driver through Job server. Have tried using the parameter in spark-defaults.conf too.

Debugging in spark shell works fine, if the same property is given in the spark-defaults.conf

Spark Issue

bsikander commented 4 years ago

Sometimes it happens that the properties are added too late to the JVM. The easiet solution would be to add these properties to spark-submit command directly. Jobserver allows you to add these properties through launcher. section in config. Here you can add these configs and try again.

AadhiKat commented 4 years ago

I tried from launcher.It does not override the spark.driver.extraJavaOptions , which is passed from spark I guess

-Dspark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC
         -verbose:gc -XX:+PrintGCTimeStamps
         -XX:MaxPermSize=512m
         -XX:+CMSClassUnloadingEnabled  -Xloggc:gc.out -XX:MaxDirectMemorySize=512M
         -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -Dlog4j.configuration=file:/*home*/job-server/log4j-server.properties -Dspark.executor.uri=/*home*/spark/spark-1.6.0.tar.gz   -Dspark.acls.enable=true -Dspark.deploy.zookeeper.url=xxx.xxx.xxx.xxx:xxxx,yyy.yyy.yyy.yyy:yyyy -Dspark.app.name=spark.jobserver.JobManager -Dspark.ui.filters=spark.filter.BasicAuthFilter -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps -XX:MaxPermSize=512m -XX:+CMSClassUnloadingEnabled -Xloggc:gc.out -XX:MaxDirectMemorySize=512M -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -Dlog4j.configuration=file:/*home*/job-server/log4j-server.properties -Dspark.executor.uri=/*home*/spark/spark-1.6.0.tar.gz, sun.jvm.flags=, sun.java.command=org.apache.spark.deploy.worker.DriverWrapper spark://Worker@xxxxxx /*home*/spark/work/driver-20200402200716-0003/spark-job-server.jar spark.jobserver.JobManager akka.tcp://JobServer@xxx:xxxx jobManager-d68208ca-c030-4c84-92a4-257637e3ec80 file:/*home*/job-server/local.conf

This is the launch command now . Where to add the debug line. From where these options are coming from ?

bsikander commented 4 years ago

So, some of the properties are coming from the following location. setenv is used as base env variables. https://github.com/spark-jobserver/spark-jobserver/blob/master/bin/setenv.sh#L91-L97

These variables can be overriden in your own *.sh file. Here is the template of this file https://github.com/spark-jobserver/spark-jobserver/blob/1d9d1e47bbd6850f8015bfbdf3bfac1663e36ab3/job-server/config/local.sh.template

On this line the property spark.driver.extraJavaOptions is getting set. So, if your set the evn variable MANAGER_EXTRA_JAVA_OPTIONS then you should be able to set your properties correctly.

AadhiKat commented 4 years ago

@bsikander . Thanks . This will work. But the question is to why there are multiple places for this. Say, first I tried it in passthrough and then through launcher and then I looked at the code and found this https://github.com/spark-jobserver/spark-jobserver/blob/master/bin/setenv.sh#L74-L76. So added a variable DEBUG_PORT in settings.sh file and only then the driver launched with the debugger. Should this be allowed through all these ? We were passing the spark.driver.extraOptions through passthrough till now and thought it would work but realized it never did .

bsikander commented 4 years ago

Your observation is correct. We do have many places to put configs and it needs to be consolidated. When i was implementing SparkLauncher inside jobserver and wrote a comments explaining this problem. You can see it here . Sooner or later we have to fix this to make it easier for users.

If you are interested, then feel free to contribute. We can also help you.