jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

PySpark3 kernel doesn't work with Livy 0.5.0 #515

Closed squidnee closed 5 years ago

squidnee commented 5 years ago

Bringing up an EMR cluster with Livy 0.5.0 leads to the following 400 error: "Invalid kind: pyspark3 (through reference chain: org.apache.livy.server.interactive.CreateInteractiveRequest[\"kind\"])".

It appears that PySpark3 is not supported by Livy 0.5.0. For reference, see this page: https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md#pyspark

I am currently in a situation where I need to use both Livy 0.5.0 with Sparkmagic, so I have created a patch that removes the PySpark3 kernel for my own purposes. I was wondering if this would be something worth pushing upstream? Or, if it shouldn't be removed, can you comment on the long-term maintenance plans for the PySpark3 kernel, or perhaps ways to modularize it so that users of Livy 0.5.0 have a path forward?

rohitp27 commented 5 years ago

Hey @squidnee, this might be similar to the issue I have raised here : https://github.com/jupyter-incubator/sparkmagic/issues/490

I changed the session-kind for SESSION_KIND_PYSPARK3 to the same value as for SESSION_KIND_PYSPARK (In the sparkmagic constants module here https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/sparkmagic/utils/constants.py) , and configured the EMR to use python3 as the default python version (I think this is also a default behaviour for the latest installations for spark, need to check that). This seems a bit hacky but works for me atm.

Have you maybe taken a cleaner approach to this?

squidnee commented 5 years ago

Hi @rohitp27, could you confirm that changing the SESSION_KIND_PYSPARK3 variable to SESSION_KIND_PYSPARK still uses the PySpark3 kernel? From my understanding it just changes the kernel to the PySpark kernel (which IIRC uses Python 2.7). I believe that the solution I ended up pursuing involved altering the PySpark environment to accept Python 3.5 and Python 3.6.

EDIT: Upon closer inspection I think we pursued the same solution. :)

Tagar commented 5 years ago

Btw, Livy now has 0.6 release https://lists.apache.org/thread.html/70c715f6394f06f0a49f76671b0f57cd1cdca35f7862a9ad2cf87fd7@%3Cdev.livy.apache.org%3E

Although this issue still stand in the new release as well

juliusvonkohout commented 5 years ago

528 I have provided a pull request that fixes the encoding for pyhon3 via kind pyspark.

Then i only install

  jupyter-kernelspec install  --user $(pip show sparkmagic | grep Location | \
    cut -d" " -f2)/sparkmagic/kernels/pysparkkernel

Furthermore just use park.pyspark.python as shown below to select the python executable.

%%configure -f
{   
    "conf":{
    "spark.driver.memory":"600M",
    "spark.executor.memory":"1024M",
    "spark.pyspark.python":"python3",
    "spark.executor.cores":"1",
    "spark.cores.max":"1"}
}
itamarst commented 5 years ago

Fixed by #540.

itamarst commented 5 years ago

I have released 0.12.8 which will hopefully fix this.

jaipreet-s commented 5 years ago

Hi @itamarst , this is great!

I see the release on conda-forge for Linux still points to v0.12.1 but PyPI is on v0.12.8. Is PyPI the recommend channel to consume sparkmagic?

itamarst commented 5 years ago

Conda-Forge packages for 0.12.8 are in progress and will be out soon, thanks to @ericdill