quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.62k stars 296 forks source link

Jupyter Execution is Confusing And Possibly Has A Bug #5376

Open hamelsmu opened 1 year ago

hamelsmu commented 1 year ago

Currently you must do the following to execute cells with Quarto. You either have to:

  1. pass the --execute flag to quarto preview or quarto render
  2. Set the enabled: true flag as specified here

However, this is confusing for many reasons

  1. enabled is not documented here under execution options https://quarto.org/docs/reference/formats/ipynb.html#execution
  2. The documentation suggests that setting eval: true should make quarto execute that cell (or notebook, if placed in the front matter). However, this is not the case. You have to additionally take the steps in number 1.
  3. If you have a project where some notebooks are executed but others are not, this leads to Quarto deleting all cell outputs in notebooks where eval: false on cells (or for the whole notebook if in front matter). However, I don't think this is desirable for notebooks, because you may have long-running computations that generate some output that you don't want get clobbered by Quarto - and it looks like there isn't a way to "protect" those cells or notebooks?

cc: @jjallaire @seem @cderv @mcanouil

BTW I'm on 1.3.333

hamelsmu commented 1 year ago

Update: I found these options under the caching section

  1. freeze: true
  2. enabled: false

This does what I want. So maybe we don't call this a bug and perhaps documentation issue? But not sure.

cderv commented 1 year ago

@hamelsmu can you clarify something for me. Are you using .ipynb as source file, or does what you described happens with .qmd file ? I think the former, but I would prefer to be sure for further discussion.

Re-reading this, I am not sure it is a documentation issue (only), and we may have possibly an undesired behavior. Thanks

mcanouil commented 1 year ago

Indeed, I was planning to do some testing on Jupyter Notebooks and Quarto flat file to check if the behaviour is also a bug, although I won't be able to do it soon, so if you can take a look @cderv

mcanouil commented 11 months ago

Few additional information:

  1. Setting eval: false does indeed erase any previously stored code cell execution. (freeze option is mandatory to avoid this behaviour.
  2. eval is a bit redundant with enabled. In the case of Jupyter notebooks eval: true means "use code cell result" not "evaluate and use code cell result" as everywhere else. I believe it would make more sense to remove enabled in the benefit of eval where this would execute code cell in Jupyter notebooks. It would also simplify documentation.
  3. quarto render --execute does work similarly as setting execute.enabled: true as documented, but execute.enabled is indeed not documented in computational sections.

@cscheid @dragonstyle Do you think we could make eval replace enabled or at least be an alias of, in Jupyter Notebooks? I believe this would be the best solution if possible.

lballabio commented 11 months ago

As a new user (an enthusiastic one: thanks!) I found this confusing as well. I wouldn't have guessed that --execute actually overwrites the input notebooks. Maybe a possibility would be to execute a temporary copy of the input notebook and leave the original alone?

mcanouil commented 11 months ago

@lballabio Why would you want to use --execute if you want to keep the computed cell of the original notebook? --execute really means execute the cells, thus the results of a notebook go into the notebook the same way as if you run the cells directly in it.

lballabio commented 11 months ago

I'm probably biased :)

I guess your main use case is a set of notebooks making up a website or a book, in which case it's only natural to save them together with the results of the computations. In my case, I have a set of notebooks which I use for trainings, and their "natural" state (and the state I save them in version control) is to be without outputs and ready to be executed by people as they go through the training. On top of this, quarto gives me a static version of the notebooks that I can use for the occasional blog post, but my process (if there were no --execute option) would be: execute the notebooks, convert them to the output format, and clean them up again. Coming from this, I guess I assumed that --execute was a shortcut for that process. I might also have assumed that a document generator would not change the inputs, but I now see that --execute is meant as an explicit command to do that.

All this said, I now think I'm probably the odd man out. In your shoes, I'd probably not invest time in supporting this use case :)

danieltomasz commented 1 month ago

What is the proper option now to execute notebook when running?

When I create a notebook:

import nbformat

# create notebook
nb = nbformat.v4.new_notebook()
raw_cell_content = """---
title: "Preprocessing of data"
format:
  html: default
execute: 
  enabled: true
jupyter:
  kernel: conda-paths-3.12
---"""

nb["cells"] += [nbformat.v4.new_raw_cell(raw_cell_content)]
test_cell = """x = 1
print(x)"""
nb["cells"] += [nbformat.v4.new_code_cell(test_cell)]
with open("test-notebook.ipynb", "w", encoding="utf-8") as f:
    nbformat.write(nb, f)

Running it with vscode

quarto preview test-notebook.ipynb --to html --no-browser --no-watch-inputs gives me error

ERROR: TypeError: Cannot read properties of undefined (reading 'name')

Stack trace:
    at ensureYamlKernelspec (file:///Applications/quarto/bin/quarto.js:38580:48)
    at eventLoopTick (ext:core/01_core.js:183:11)
    at async Object.execute (file:///Applications/quarto/bin/quarto.js:38358:30)
    at async renderExecute (file:///Applications/quarto/bin/quarto.js:77017:27)
    at async renderFileInternal (file:///Applications/quarto/bin/quarto.js:77199:43)
    at async renderFiles (file:///Applications/quarto/bin/quarto.js:77067:17)
    at async renderProject (file:///Applications/quarto/bin/quarto.js:77394:25)
    at async renderForPreview (file:///Applications/quarto/bin/quarto.js:82818:26)
    at async render (file:///Applications/quarto/bin/quarto.js:82703:29)
    at async preview (file:///Applications/quarto/bin/quarto.js:82714:21)
cderv commented 1 month ago

@danieltomasz I don't think this is related issue. You error message is about the specification of the kernel part.

The part you added

execute: 
  enabled: true

is the right one, and the fact you get the error here, means execution tried to happen.

We may have another issue regarding to specification handling for this usage, so I'll move this to another thread, and we need an example without conda-paths-3.12