jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

[BUG] job runs much slower than Zeppelin (remote server) or spark submit #743

Open GaryLiuGTA opened 2 years ago

GaryLiuGTA commented 2 years ago

Describe the bug I have Enterprise Spark Cluster and remote Zeppelin server, and I am using sparkmagic in my local laptop to submit jobs to remote spark clusters.

The same code runs much faster when using Zeppelin or spark submit, and it is slower when submitted from local laptop by sparkmagic. For example, one piece of code ran about 7-10 minutes in Zeppelin, but at the same time, this code completed in 30-40 minutes when submitted locally using sparkmagic. So wondering if this is related to network conditions between spark clusters and machine where the code was submitted? As there are 10GB bandwidth between spark clusters and Zeppelin server, but I am using DSL (work from home). Are there lots of traffics between kernel and clusters required? Or it could be other reasons?

To Reproduce Steps to reproduce the behavior.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Versions:

SparkMagic 0.19.1
Livy (if you know it): company's on-premise livy service, internal server.
Spark 2.2.3

Additional context Add any other context about the problem here.

GaryLiuGTA commented 2 years ago

Add a comparison of Zeppelin and sparkmagic (using vs code jupter extension), sparkmagic used 3 times of time Code running with Zeppelin: image Code running with sparkmagic image