microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.29k stars 292 forks source link

Cannot connect to Sparkmagic kernels #11126

Closed DonJayamanne closed 2 years ago

DonJayamanne commented 2 years ago

Discussed in https://github.com/microsoft/vscode-jupyter/discussions/11125

Originally posted by **jvaesteves** August 16, 2022 Hello, I've trying to use VSCode to accept the Sparkmagic kernels that I installed on a venv using Poetry, so I can connect to an EMR instance via Livy, but when listing the kernels on the notebook interface, all that appears are Python versions and venvs from my computer. I tried this on PyCharm Pro and works well. #### Setup: - VSCode version: 1.70.1 - Jupyter extension version: v2022.7.1102252217 - Python version: tried 3.10.5 and 3.7.13 - Poetry version: 1.1.14 - Packages: sparkmagic==0.20.0 #### Steps to reproduce: ```bash SPARKMAGIC_LOCATION=$(pip show sparkmagic | grep Location | cut -d" " -f2) jupyter nbextension enable --py --sys-prefix widgetsnbextension jupyter-kernelspec install --user $SPARKMAGIC_LOCATION/sparkmagic/kernels/sparkkernel jupyter-kernelspec install --user $SPARKMAGIC_LOCATION/sparkmagic/kernels/pysparkkernel jupyter-kernelspec install --user $SPARKMAGIC_LOCATION/sparkmagic/kernels/sparkrkernel jupyter serverextension enable --py sparkmagic ``` #### What is expected For PySpark, Spark and SparkR to appear on VSCode kernel list ### What is happening ![Screenshot 2022-08-15 at 20 19 46](https://user-images.githubusercontent.com/32674762/184693584-94e396ff-16b3-4c62-9ff5-a7db426368f5.png)
DonJayamanne commented 2 years ago

@jvaesteves Thanks for filing this and I'm sorry its not working as expected. Please could you:

jvaesteves commented 2 years ago
jvaesteves commented 2 years ago

I was also able to enable two of the kernels on VSCode list, but the PySpark kernel (that actually uses Python) still does not appear as an option. Also, the list shows these kernels pointing to a path that does not exists on my filesystem and does not correspond to jupyter kernelspec list.

Available kernels:
  pysparkkernel    /Users/myuser/Library/Jupyter/kernels/pysparkkernel
  sparkkernel      /Users/myuser/Library/Jupyter/kernels/sparkkernel
  sparkrkernel     /Users/myuser/Library/Jupyter/kernels/sparkrkernel
  python3          /Users/myuser/Library/Caches/pypoetry/virtualenvs/jupyter-playground-FOTz9V3J-py3.10/share/jupyter/kernels/python3

Screenshot 2022-08-16 at 11 18 02

Furthermore, I can select and use PySpark as kernel on the notebook web interface on localhost.

DonJayamanne commented 2 years ago

@jvaesteves , thanks for the logs.

Also, the list shows these kernels pointing to a path that does not exists on my filesystem and does not correspond to

Which one are you referring to from the above UI.

but the PySpark kernel (that actually uses Python) still does no

Which one are you referring to form the list of kernels.

Available kernels:

Please could you open the kernelspec.json file in the following directories and dump the contents here so I can see this.

pysparkkernel    /Users/myuser/Library/Jupyter/kernels/pysparkkernel
python3          /Users/myuser/Library/Caches/pypoetry/virtualenvs/jupyter-playground-FOTz9V3J-py3.10/share/jupyter/kernels/python3
jvaesteves commented 2 years ago

Which one are you referring to from the above UI.

The ones on the Jupyter Kernel sections that points to /python

Which one are you referring to form the list of kernels.

pysparkkernel

Please could you open the kernelspec.json file in the following directories and dump the contents here so I can see this.

There you go. python3 | pysparkkernel

DonJayamanne commented 2 years ago

@jvaesteves The problem is the kernlespec.json files do not contain the fully qualified paths to python. If you check the contents (see below):

{
 "argv": [
  "python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python 3 (ipykernel)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

The first value in argv is just python, hence the value doesn't contian the fully qualified path to pyhton. If you were to launch Jupyter in your terminal from another Python environment, say you launched jupyter from /usr/bin/python3, then you'd run into similar issues.

To get around this, i'd suggest finding the fully qualified path to the Python environment and then updating the above kernlspec with that. as follows:


{
 "argv": [
  "/usr/bin/python3", // or change to `/Users/myuser/Library/Caches/pypoetry/virtualenvs/jupyter-playground-FOTz9V3J-py3.10/bin/python`
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python 3 (ipykernel)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}```

let me know if that works for you.
jvaesteves commented 2 years ago

@DonJayamanne Thanks a lot for the help. With this change, now the PySpark works correctly! :)

DonJayamanne commented 2 years ago

@jvaesteves Thanks for getting back with the confirmation, its not ideal but I'm glad you got things working.

I'd like to ensure we fix this workflow, hence have a few questions:

The above information would help us get things right so you don't have to (hopefully) do this again.

jvaesteves commented 2 years ago
  • What is the environment in which you have spark installed, is that Poetry?

Yes

  • How do you launch Jupyter in your terminal?

  • What is your active python environment in the Python extension

When I select the notebook file on my project, Jupyter starts and loads the PySpark kernel by itself. It is not needed to start Poetry venv, and even when I select the Interpreter on VSCode to be any other than from what is used by Poetry, the kernel bypasses it and uses the venv one (confirmed with the which python on the notebook).

DonJayamanne commented 2 years ago

thank you very much, few more questions, thanks for your support and patience

When I select the notebook file on my project, Jupyter starts and loads the

how do you do this in the terminal. please could you paste the commands here

also part could you let me know what you get when you run the following two commands

which python which jupyter

jvaesteves commented 2 years ago

Hello @DonJayamanne , sorry for taking so long to reply to you.

how do you do this in the terminal. please could you paste the commands here

That's the point, I don't execute any other command on terminal besides the ones I listed on the Steps to reproduce section of the issue. The only other command that is executed automatically when I open my terminal on VSCode is to activate my venv.

also part could you let me know what you get when you run the following two commands

which python

/Users/myuser/Library/Caches/pypoetry/virtualenvs/jupyter-playground-FOTz9V3J-py3.10/bin/python

which jupyter

/Users/myuser/Library/Caches/pypoetry/virtualenvs/jupyter-playground-FOTz9V3J-py3.10/bin/jupyter
DonJayamanne commented 2 years ago

@jvaesteves From what I understand things are working now, after you updated the kernelspec.json file, is that right?

DonJayamanne commented 2 years ago

Closing this issue as its been over 4 weeks, since the information was requested. We'll be happy to reopen the issue when the requested information has been provided.