jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.75k stars 569 forks source link

First time user issue - "Name Error: name 'spark' is not defined". #781

Open mikecoe opened 6 years ago

mikecoe commented 6 years ago

Hello,

I am new to nbconvert and am trying to get it up and running. I am trying to execute a Jupyter Notebook from the command line and have tried a few different methods each of which hits a similar error.

I can convert a notebook to html no problem. But when i try and convert to python and then execute or just execute it fails.

I have cut the notebook content down to a bare minimum and ensures it runs through the Jupyter UI, but it doesnt seem to recognise "spark" when i run it from the command line. I have tried to reference sparkContext and sqlContext also and get the same error.

The first cell in my notebook fails and reads:

dfBrokerCombined = spark.read.json("adl://fcg.azuredatalakestore.net/prod/transcache/brokercombined/parsed/stream/2018-01-28")

The error i get is Name Error: name 'spark' is not defined. That doesn't sound like it involves Jupyter Do i need to do something else in my notebook to get the "spark" reference recognised from the commandline?

The call i am making is:

jupyter nbconvert --to notebook --execute mike3.ipynb

Thanks in advance for any help.

Mike

takluyver commented 6 years ago

I'm guessing that pyspark automatically makes spark available for you in the notebook. How are you launching the notebook? Does it use a pyspark kernel, or the normal Python kernel?

mikecoe commented 6 years ago

Thanks for the reply.

Yes it’s using the pyspark kernel. How would I modify the notebook to load spark so that it also worked from the command line ?

Sent from my iPhone

On 19 Mar 2018, at 11:24, Thomas Kluyver notifications@github.com wrote:

I'm guessing that pyspark automatically makes spark available for you in the notebook. How are you launching the notebook? Does it use a pyspark kernel, or the normal Python kernel?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

takluyver commented 6 years ago

I don't know. If pyspark is a separate kernel, you should be able to run that with nbconvert as well. Try using the option --ExecutePreprocessor.kernel_name=pyspark . If it's still not working, ask on a Pyspark mailing list or issue tracker.

mikecoe commented 6 years ago

Ok thanks.

Could you possibly send me a simple notebook and nbconvert command that works, which I could look at?

Sent from my iPhone

On 19 Mar 2018, at 12:10, Thomas Kluyver notifications@github.com wrote:

I don't know. If pyspark is a separate kernel, you should be able to run that with nbconvert as well. Try using the option --ExecutePreprocessor.kernel_name=pyspark . If it's still not working, ask on a Pyspark mailing list or issue tracker.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

takluyver commented 6 years ago

Sorry, I've never used pyspark, so I don't know what's needed. The command you're using looks right, and there are plenty of example notebooks (e.g. on the nbviewer home page).

prashant-shahi commented 5 years ago

FindSpark module will come handy here.

Install the module using the following command:

python -m pip install findspark

Make sure SPARK_HOME is set.

Usage:

import findspark
findspark.init()
import pyspark # Call this only after findspark.init()
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark.read.json("adl://fcg.azuredatalakestore.net/prod/transcache/brokercombined/parsed/stream/2018-01-28")
df.show()