jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

[qn] Equivalent to `%run -i localscript.py` that runs a script on the cluster #738

Closed shern2 closed 2 years ago

shern2 commented 2 years ago

[qn] Is there an equivalent to %run -i localscript.py that runs a script on the cluster? E.g. In the local notebook, I have

x=1
y=2

in the localscript.py. And would like to run this on the cluster itself. Currently running %run will run the script locally on the notebook instead of the cluster.

devstein commented 2 years ago

No, not unless localscript.py is on the cluster. This is a limitation of using Apache Livy to interact with Spark cluster. Any cell run by Sparkmagic is submitted to Livy via an HTTP request and is evaluated within the context of the cluster, so you can't reference files local to your machine

shern2 commented 2 years ago

Here is the workaround to run a local script on the cluster (assuming it has all the necessary dependencies):

Create a local script called toRunOnCluster.py which has:

import importlib
x=1

In another script called localscript.py, have:

import IPython
from pathlib import Path

ipython = IPython.get_ipython()
script_str = Path('toRunOnCluster.py').open().read()
ipython.run_cell_magic('spark', '', script_str)

You can now run within your notebook %run -i localscript.py, then validate that the script was successfully ran on the cluster in a separate Notebook cell:

print(x, importlib)