microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.25k stars 275 forks source link

Support for editing Plain text files (like Python, MyST and R Markdown-based) as notebooks #1240

Open allefeld opened 4 years ago

allefeld commented 4 years ago

Feature: Notebook Editor, Interactive Window, Python Editor cells

Description

The notebook provides an option 'Convert and save to a Python script', which creates a representation of the notebook as a standard .py script using the 'percent format'. This does not only provide yet another way to interactively work with Python and rich output – the format is also a solution to the long-standing problem that the JSON-format of .ipynb files including embedded binary data (graphics) does not play well with source control systems.

Personally I prefer to work with the standard notebook editor interface, but because of the latter problem with source control I like to have a percent-formatted version in parallel.

My feature request: Have an option, either global, per-workspace, or per-notebook, that whenever an .ipynb file is saved, a percent-formatted version of it is created/updated in the background, too. This way the notebook proper can be used for interactive work, but the commits on the percent-formatted version can serve as a readable record of what was done, so if it is necessary to revert changes, it is clear which commit is the right one.

Microsoft Data Science for VS Code Engineering Team: @rchiodo, @IanMatthewHuff, @DavidKutu, @DonJayamanne, @greazer, @joyceerhl

joyceerhl commented 4 years ago

Thanks for the feature request! We'll discuss it at our upcoming triage.

amittleider commented 3 years ago

Definitely agree with this. Now that we have the real jupyter feel directly inside of VSCode, we don't even need to open a browser anymore. The problem is that when VSCode reads percent scripts, the look and feel of the interactive mode is different (and worse) than the new functionality.

We use percent scripts in the repo and open them using Jupyter with Jupytext. Jupytext links the percent script with the ipynb file, just as @allefeld is explaining. It'd be perfect to have this functionality right inside the editor.

kaaloo commented 3 years ago

We were just discussing this issue at work. Would love to see a solution directly in vscode!

echaya commented 3 years ago

Looking forward to this feature. Once released, I'll persuade the whole team to move from JLab to VSCode :D

ssiegel95 commented 3 years ago

I would love to have this too. In fact, even having a scriptable means of converting from a "percent formatted" (PF) .py file back to a .ipynb would go a long way towards streamlining many of my team's workflows where we like to keep the PF versions for our automated tests and joint development via git but then export to .ipynb for our forward facing documentation pages.

Thanks for the amazing developer tools that you guys put out there, by the way!

Aonnghus commented 3 years ago

Any news on this issue ? That would really be a great addition to vscode !

rchiodo commented 3 years ago

Sorry but this is not currently on our backlog. We're working on publishing our backlog so that people can see our plans.

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

If anybody wants to submit a PR we gladly accept them.

venaturum commented 3 years ago

@rchiodo

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

Hi Rich, how can we watch save requests? I've tried several "run on save" extensions and none of them seem to work on ipynb files for some reason.

rchiodo commented 3 years ago

I believe you could listen to this event:

https://github.com/microsoft/vscode/blob/27b2434631bc9e253c30f3d55b837fea41a7c170/src/vs/vscode.proposed.d.ts#L1209

Then after the event fires, run nbconvert on the notebook.

DonJayamanne commented 3 years ago

@venaturum I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext Currently only usable in VS Code Insiders.

allefeld commented 3 years ago

@DonJayamanne, that looks great, thank!!

Slightly offtopic: I've been looking forward to native notebooks and all that is possible with them, but it still seems to be a way off to get into stable VSCode. On the other hand, I usually don't like to use unstable software for daily work. Is Insiders "unstable"?

DonJayamanne commented 3 years ago

k. Is Insiders "unstable"?

I wouldn't call it unstable. Its the latest build of VS Code and latest build of our extensions. These get updated daily. We have CI pipelines (tests) to ensure we don't ship breaking changes. However once in a while things do break and we try to get them resolve ASAP (within the same day).

lgonzalezsa commented 3 years ago

@venaturum I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext Currently only usable in VS Code Insiders.

I was about to comment about my enjoyable experience with jupytext, as alternative or now I should said, meantime we have a solution in vscode. Nice!

venaturum commented 3 years ago

@venaturum I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext Currently only usable in VS Code Insiders.

Thanks @DonJayamanne , I've actually been wrestling with Jupytext for the last couple of days trying to achieve the setup my team is after. Maybe you can advise me as to whether it's possible. What we're aiming for is

1) A repository with a folder called scripts which contain py files in percent format 2) When we clone this repo we want to run a command from a terminal which creates corresponding ipynb files in a folder called notebooks 3) We then want to run a command which pairs the files in scripts folder to files in notebooks folder 4) Keep notebooks/py files synced through Insiders/Jupytext extension and git hooks

I have set formats = "notebooks///ipynb,scripts///py:percent" in jupytext.toml and tried executing all sorts of jupytext commands on the command line to achieve the above but haven't succeeded.

DonJayamanne commented 3 years ago

If you have questions related to the Juptext extension for VS Code, please file it against that repo. The Jupyter extension doesn't support jupytext natively (concepts such as keeping the files in sync and the like are currently out of scope - not yet supported)

DonJayamanne commented 1 year ago

Duplicate of https://github.com/microsoft/vscode-jupyter/issues/1237 taking this as this falls into nb serialization, also similar to the jupytext notebook viewer (r markdown and myst files)

marcglobality commented 1 year ago

Hi @DonJayamanne , do we have any updates on this? is it solved, just not documented? thx

ruslaniv commented 11 months ago

Please add the ability to syncing notebooks to py scripts.

Also I do not understand these two phrases used together especially since a lot of people expressed interest in this:

Sorry but this is not currently on our backlog

and

It's not that difficult though

starball5 commented 8 months ago

Related on Stack Overflow: How to config automatic sync Jupyter notebook .ipynb and .py files in VSCode e.g. by using Jupytext

td-anne commented 5 months ago

Just a note: there are (at least) two possible workflows one might want to support here:

  1. Only a .py file ever exists, but the user can/must open it through the notebook interface. Outputs are discarded when the editor window is closed. Changes are saved to the .py file.
  2. Both .py and .ipynb files exist. The .py file is opened as a normal python file, and the .ipynb file is opened as a notebook. When either one is saved, the command jupytext --sync is run to synchronize the two. This command does not modify the cell outputs, which live in the .ipynb file, but ensures that the code/markdown in the .py file always matches that in the .ipynb file. Only the .py file is checked in to version control; the outputs in the .ipynb are lost only when moving to a fresh clone.

Both modes of working have their merits, but the code required to support them is very different. Mode #2 requires nothing more than reliably running a sync every time either file is saved (though a few extra syncs wouldn't hurt anyone, for example on load). Mode #1 requires support from within the notebook machinery. I note that the extension by @DonJayamanne supports (only) #1. Various schemes have been tried to run the sync on save, but for some reason it appears difficult to arrange for a command to be run after a notebook is saved. This could in principle be worked around by creating a watcher daemon that would simply run a sync any time either file was modified; as long as VSCode could be persuaded to keep this running it would automate the task.

jabbera commented 5 months ago

I'd love #1 described in @td-anne. I want to use the standard notebook interface with jupytext files. I don't care about saving outputs!

marctorsoc commented 5 months ago

I would like #2 as described above. I do want to keep the outputs for the future. Right now I solve it by running

jupytext --to py:percent file.ipynb

after running the notebook, but it'd be great to be automatic and always sync'd, as it happens with jupyter in the browser

One advantage of this is that when editing the notebook and saving (assuming it automatically syncs to the py file), one would be able to git diff very quickly, without having to sync with jupytext. This is useful in many cases e.g. after doing some data inspection in the notebook, creating many new cells to debug and later aiming to revert the changes

mwouts commented 5 months ago

Thanks for considering this!

Quick question re #1, I see that the interactive script mode (also on TDS, search for interactive scripts in that page), which has been around for a while, still works in VS Code. That means that I can execute a percent script step by step in VS Code. Isn't that close already to what you want? How different would a Jupyter mode be?

Personally I would be using mostly #2 i.e. keep the outputs on disk too.

image

jabbera commented 5 months ago

How different would a Jupyter mode be?

Different enough that my users don't want to use it. Using Jupyter Notebook mode outputs appear directly below the cell that executed it. They don't need another window open. It's also inline with the behavior of JupyterLab and it's jupytext extension.

td-anne commented 5 months ago

Thanks for considering this!

Quick question re #1, I see that the interactive script mode (also on TDS, search for interactive scripts in that page), which has been around for a while, still works in VS Code. That means that I can execute a percent script step by step in VS Code. Isn't that close already to what you want? How different would a Jupyter mode be?

Personally I would be using mostly #2 i.e. keep the outputs on disk too.

I would be sticking to #2 as well. In fact I do manage a clumsy data-loss-prone version of it by manually running jupytext --sync when I remember to. Large or complex notebooks I end up foregoing VSCode's editing power and running things in actual JupyterLab, where this Just Works.

But the in-line rendered markdown, mathematics, and plots are a tremendous selling point for scientific users. Not to mention the rich display of certain outputs (sympy equations, pandas dataframes, generated markdown). With the notebook view, you can execute a notebook and have a literate-programming view of your results, in order, in place, associated with the code that generated them.

The interactive percent mode does have its place, and it can be less confusing than a notebook when you're running cells substantially out of order. But it is the notebook view that took over data science.

AlexeyDmitriev commented 2 months ago

For us, what we need is #1, As @jabbera said the current state is different enough to be not convenient to use.

2 is also acceptable (we'd just need to gitignore ipynb's) but it looks more complicated for both implementation and usage

VolkerH commented 2 months ago

For both options, .py files are explicitly mentioned (.py percent format). I'm not sure whether you implicitly also meant to support the other jupytext supported formats such as .md (markdown, myst markdown).

The interactive percent mode does have its place, and it can be less confusing than a notebook when you're running cells substantially out of order. But it is the notebook view that took over data science.

I agree. In the context of jupyterbook or Sphinx project, there will typically be a step that renders the output to something like Github pages, so the user sees the familiar notebook output (with some nice extra formatting and cross-referening) but you keep the output diff noise out of the repo. (just describing our use case for this feature, not trying to explain jupyterbook).

allefeld commented 2 months ago

Personally, I've moved from Jupyter notebooks to Quarto documents. It's slightly less interactive, but Quarto documents are plain text files to begin with, and Quarto supports many additional Markdown features and many output formats (through Pandoc).

lgonzalezsa commented 2 months ago

Personally, I've moved from Jupyter notebooks to Quarto documents. It's slightly less interactive, but Quarto documents are plain text files to begin with, and Quarto supports many additional Markdown features and many output formats (through Pandoc).

I did not move completely but now I am keeping my computations in Jupyter Notebooks and use the Quarto feature to embed what I need from the Jupyter Notebook and expose it into my qmd report. My next step is to try to move completely but I still have peers that are Notebook centric.

kesshijordan commented 3 weeks ago

Sorry but this is not currently on our backlog. We're working on publishing our backlog so that people can see our plans.

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

If anybody wants to submit a PR we gladly accept them.

I would like to add an additional potential use case around supporting a workflow to securely use Copilot with VSCode in a notebook IDE when outputs may contain sensitive data.

When using VSCode as a notebook IDE with the Copilot extension there is a concern that given the nbformat containing data from outputs, sensitive data stored from the output of cells could be exposed. In sectors like healthcare we need guardrails to prevent that from happening if we want to use these tools. I asked about this in a support ticket and was advised:

I appreciate you bringing this our attention. I shared your questions to our Copilot engineering team. We went ahead and explored this further. We have concluded with and suggest that it is best to use content exclusions as we cannot guarantee that Copilot will not use data Jupyter notebook cells into its suggestions. For more information, read Configuring content exclusions for GitHub Copilot in the GitHub Docs.

I think one potential workflow is to sync a .py to .ipynb in a repo with the .ipynb subject to Copilot content exclusions and in the .gitignore. That would allow Copilot to be used on the .py percent format, while ensuring that when the .ipynb file is saved the data from the outputs stored in the json structure is protected. I added the jupytext sync command to the save keyboard binding as a task, which has lowered the friction, but it does introduce other off-target effects given keyboard bindings are universal (though the workarounds proposed here looks promising). I expect this feature request would greatly lower the friction to setup and use this kind of workflow.

I also want to acknowledge: thanks for considering this feature request and for all the contributors/developers do in the community!

TL;DR: I think VSCode support of this feature request would facilitate a secure/lower friction way to use Copilot with VSCode as a notebook IDE when outputs may contain sensitive data