nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.98k stars 429 forks source link

bug using Scala jupyter notebook + spylon-kernel #662

Open mtournier-apixio opened 2 years ago

mtournier-apixio commented 2 years ago

This issue appears whenever I try to run Scala notebooks (spylon-kernel) notebooks using papermill.

If I make the scala val assignments in a standalone parameters cell, papermill does not inject them properly.

This spylon-kernel notebook example does not change parameters:

%%init_spark
launcher.conf.set("spark.driver.cores", "2")
launcher.conf.set("spark.driver.memory", "8G")
launcher.conf.set("spark.executor.cores", "4")
// this is tagged as `parameters` cell
val a = "this is a variable"
spark.sql(s"select 1 as ${a}").show

This works (if I make the variable call inside of %%init_spark and as a python code). In the output notebook, papermill converted the python assignments into Scala code (!):

// this is tagged as `parameters` cell
%%init_spark
a = "this is a variable"

launcher.conf.set("spark.driver.cores", "2")
launcher.conf.set("spark.driver.memory", "8G")
launcher.conf.set("spark.executor.cores", "4")
spark.sql(s"select 1 as ${a}").show

Note that in this second case, I can't make a Scala variable explicit in a cell.

Relevant versions:

pyspark==3.1.1
papermill==2.3.4
spylon==0.3.0
spylon-kernel==0.4.1
jupyter-client==7.0.6
jupyter-core==4.9.1
jupyter-packaging==0.11.0
jupyter-server==1.11.1
jupyterlab==3.0.16
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
jupyterlab-sparkmonitor==4.1.0

bash script:

#!/bin/bash

papermill \
/home/jovyan/spylon-test.ipynb \
/home/jovyan/spylon-test-output.ipynb \
-p a "ccccc2a"

Is it possible to correct this so it works like it is expected (as the first example)?

Thanks guys, I love papermill, big appreciation for what you are doing for the community.

MSeal commented 2 years ago

Apologies for the slow response (been dealing with some health issues and not paying attention to Github for a bit).

Is the root problem around the val prefix vs plain variable assignment? In older scala kernels it allowed redefinition with a val so I didn't bother fixing that. If that's the problem we may need to make the variable assignment a little smarter in trying to see if it's been defined above already.

mtournier-apixio commented 2 years ago

Matthew, no apology necessary!

In my use case, and perhaps to keep the "parameters" cell as close as possible for python, I would ideally see as you said in your response. So I can have a cell like:

val ArgumentA = "foo"
val ArgumentB = "bar"

Again thanks for the awesome work, I love papermill.

On Thu, Jun 9, 2022 at 7:08 PM Matthew Seal @.***> wrote:

Apologies for the slow response (been dealing with some health issues and not paying attention to Github for a bit).

Is the root problem around the val prefix vs plain variable assignment? In older scala kernels it allowed redefinition with a val so I didn't bother fixing that. If that's the problem we may need to make the variable assignment a little smarter in trying to see if it's been defined above already.

— Reply to this email directly, view it on GitHub https://github.com/nteract/papermill/issues/662#issuecomment-1151700271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMG6YOSBXYKSBFUHKBS6VZLVOJ2PLANCNFSM5VGEYJDQ . You are receiving this because you authored the thread.Message ID: @.***>