quarto-dev / quarto

Quarto open-source scientific and technical publishing system
https://quarto.org
GNU Affero General Public License v3.0
280 stars 19 forks source link

Markdown is not always converted to python in multiprocessing/threading from the Quarto VS Code extension #401

Open cameronraysmith opened 3 months ago

cameronraysmith commented 3 months ago

This is a cross-post of https://github.com/quarto-dev/quarto-cli/issues/9134 based on @cscheid's recommendation in https://github.com/quarto-dev/quarto-cli/issues/9134#issuecomment-2008161485. It affects both the CLI and the VS Code extension. The error that appears in VS Code is shown at the top level while the error that appears in the CLI is folded below it for reference.

Bug description

I recognize this is not an ideal design pattern, but calling a third-party library that uses multiprocessing and threading as shown in the minimal example below, results in an error suggesting the markdown may be passed to the process without being converted to python. The source code in the example works as expected in a python interpreter or jupyter kernel.

Steps to reproduce

near-minimal example

````qmd --- title: "Markdown is not always converted to python in multiprocessing/threading" format: html execute: enabled: true jupyter: kernelspec: display_name: "Python 3" language: python name: python3 --- ## Minimal example This notebook demonstrates a potential issue with rendering notebooks using `multiprocessing.Manager().Queue()` and `threading.Thread`. ```{python} from multiprocessing import Manager from threading import Thread try: from tqdm import tqdm except ImportError: tqdm = None def update(progress_bar, queue, total): """Update progress bar based on values from the queue.""" for _ in range(total): queue.get() if progress_bar is not None: progress_bar.update(1) def simulate_issue(): range_length = 10 unit = "items" progress_bar = None if tqdm is None else tqdm(total=range_length, unit=unit) queue = Manager().Queue() thread = Thread(target=update, args=(progress_bar, queue, range_length)) thread.start() for _ in range(range_length): queue.put('done') thread.join() if progress_bar is not None: progress_bar.close() ``` executing `simulate_issue()` proceeds without error in a jupyter notebook or ipython terminal ```{python} simulate_issue() ``` but leads to ```pytb 0%| | 0/10 [00:00", line 1, in File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 288, in run_path code, fname = _get_code_from_file(run_name, path_name) File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 257, in _get_code_from_file code = compile(f.read(), fname, 'exec') File "~/template.qmd", line 1 --- ^ SyntaxError: invalid syntax ``` when executed from the Quarto VS Code extension. ````

Expected behavior

The notebook should render without error.

Actual behavior

The following error is surfaced in an interactive session from the VS Code extension

  0%|          | 0/10 [00:00<?, ?items/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 288, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "~/.pyenv/versions/3.10.13/lib/python3.10/runpy.py", line 257, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "~/template.qmd", line 1
    ---
       ^
SyntaxError: invalid syntax

while

terminal traceback yields

```pytb ❯ quarto render template.qmd --debug Starting python3 kernel...Done Executing 'template.ipynb' Cell 1/2: ''...Done Cell 2/2: ''...ERROR: An error occurred while executing the following cell: ------------------ simulate_issue() ------------------ ----- stderr ----- 0%| | 0/10 [00:00 1 simulate_issue() Cell In[1], line 20, in simulate_issue() 17 unit = "items" 19 progress_bar = None if tqdm is None else tqdm(total=range_length, unit=unit) ---> 20 queue = Manager().Queue() 21 thread = Thread(target=update, args=(progress_bar, queue, range_length)) 22 thread.start() File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/context.py:57, in BaseContext.Manager(self) 55 from .managers import SyncManager 56 m = SyncManager(ctx=self.get_context()) ---> 57 m.start() 58 return m File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/managers.py:562, in BaseManager.start(self, initializer, initargs) 560 ident = ':'.join(str(i) for i in self._process._identity) 561 self._process.name = type(self).__name__ + '-' + ident --> 562 self._process.start() 564 # get address of server 565 writer.close() File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/process.py:121, in BaseProcess.start(self) 118 assert not _current_process._config.get('daemon'), \ 119 'daemonic processes are not allowed to have children' 120 _cleanup() --> 121 self._popen = self._Popen(self) 122 self._sentinel = self._popen.sentinel 123 # Avoid a refcycle if the target function holds an indirect 124 # reference to the process object (see bpo-30775) File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/context.py:288, in SpawnProcess._Popen(process_obj) 285 @staticmethod 286 def _Popen(process_obj): 287 from .popen_spawn_posix import Popen --> 288 return Popen(process_obj) File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_spawn_posix.py:32, in Popen.__init__(self, process_obj) 30 def __init__(self, process_obj): 31 self._fds = [] ---> 32 super().__init__(process_obj) File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_fork.py:19, in Popen.__init__(self, process_obj) 17 self.returncode = None 18 self.finalizer = None ---> 19 self._launch(process_obj) File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/popen_spawn_posix.py:42, in Popen._launch(self, process_obj) 40 tracker_fd = resource_tracker.getfd() 41 self._fds.append(tracker_fd) ---> 42 prep_data = spawn.get_preparation_data(process_obj._name) 43 fp = io.BytesIO() 44 set_spawning_popen(self) File ~/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py:183, in get_preparation_data(name) 180 # Figure out whether to initialise main in the subprocess as a module 181 # or through direct execution (or to leave it alone entirely) 182 main_module = sys.modules['__main__'] --> 183 main_mod_name = getattr(main_module.__spec__, "name", None) 184 if main_mod_name is not None: 185 d['init_main_from_name'] = main_mod_name AttributeError: module '__main__' has no attribute '__spec__' ```

Your environment

environment

``` ❯ code --version 1.87.2 863d2581ecda6849923a2118d93a088b0745d9d6 arm64 ❯ uname -a Darwin 22.6.0 Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:28 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T6000 arm64 ❯ sw_vers -productVersion 13.6 ```

Quarto check output

quarto check

```bash ❯ quarto check Quarto 1.4.551 [✓] Checking versions of quarto binary dependencies... Pandoc version 3.1.11: OK Dart Sass version 1.69.5: OK Deno version 1.37.2: OK [✓] Checking versions of quarto dependencies......OK [✓] Checking Quarto installation......OK Version: 1.4.551 Path: /Applications/quarto/bin [✓] Checking tools....................OK TinyTeX: (not installed) Chromium: (not installed) [✓] Checking LaTeX....................OK Using: Installation From Path Path: /Library/TeX/texbin Version: 2023 [✓] Checking basic markdown render....OK [✓] Checking Python 3 installation....OK Version: 3.10.13 Path: /xxxx-py3.10/bin/python3 Jupyter: 5.3.0 Kernels: ir, julia-1.9, bash, maxima, python3 [✓] Checking Jupyter engine render....OK ```

cscheid commented 3 months ago

(Copying my comment from the other issue:)

The interactive case

The problem seems to be happening in the way that the VS code Jupyter extension communicates the global module to the subprocess. Quarto's VS Code extension uses the Jupyter extension for interactive cell execution. In this situation, somehow the Jupyter extension communicates to spawn.py that the main module is actually the .qmd file that contains the cells. I don't know why 1) this is necessary and 2) how the Jupyter extension manages to do the right thing when used directly by opening a .ipynb file in VS Code and executing the cells interactively.