Open mikecoe opened 6 years ago
I'm guessing that pyspark automatically makes spark available for you in the notebook. How are you launching the notebook? Does it use a pyspark kernel, or the normal Python kernel?
Thanks for the reply.
Yes it’s using the pyspark kernel. How would I modify the notebook to load spark so that it also worked from the command line ?
Sent from my iPhone
On 19 Mar 2018, at 11:24, Thomas Kluyver notifications@github.com wrote:
I'm guessing that pyspark automatically makes spark available for you in the notebook. How are you launching the notebook? Does it use a pyspark kernel, or the normal Python kernel?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I don't know. If pyspark is a separate kernel, you should be able to run that with nbconvert as well. Try using the option --ExecutePreprocessor.kernel_name=pyspark
. If it's still not working, ask on a Pyspark mailing list or issue tracker.
Ok thanks.
Could you possibly send me a simple notebook and nbconvert command that works, which I could look at?
Sent from my iPhone
On 19 Mar 2018, at 12:10, Thomas Kluyver notifications@github.com wrote:
I don't know. If pyspark is a separate kernel, you should be able to run that with nbconvert as well. Try using the option --ExecutePreprocessor.kernel_name=pyspark . If it's still not working, ask on a Pyspark mailing list or issue tracker.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Sorry, I've never used pyspark, so I don't know what's needed. The command you're using looks right, and there are plenty of example notebooks (e.g. on the nbviewer home page).
FindSpark module will come handy here.
Install the module using the following command:
python -m pip install findspark
Make sure SPARK_HOME
is set.
Usage:
import findspark
findspark.init()
import pyspark # Call this only after findspark.init()
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext.getOrCreate()
spark = SparkSession(sc)
df = spark.read.json("adl://fcg.azuredatalakestore.net/prod/transcache/brokercombined/parsed/stream/2018-01-28")
df.show()
Hello,
I am new to nbconvert and am trying to get it up and running. I am trying to execute a Jupyter Notebook from the command line and have tried a few different methods each of which hits a similar error.
I can convert a notebook to html no problem. But when i try and convert to python and then execute or just execute it fails.
I have cut the notebook content down to a bare minimum and ensures it runs through the Jupyter UI, but it doesnt seem to recognise "spark" when i run it from the command line. I have tried to reference sparkContext and sqlContext also and get the same error.
The first cell in my notebook fails and reads:
The error i get is
Name Error: name 'spark' is not defined
. That doesn't sound like it involves Jupyter Do i need to do something else in my notebook to get the "spark" reference recognised from the commandline?The call i am making is:
Thanks in advance for any help.
Mike