quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.59k stars 294 forks source link

rendering attempts to write within virtual environment #8215

Open aronatkins opened 6 months ago

aronatkins commented 6 months ago

Bug description

The quarto render attempts to render Markdown files contained within a virtual environment directory structure.

Steps to reproduce

Given a Dockerfile:

ARG QUARTO_VERSION=1.4.537
FROM ghcr.io/quarto-dev/quarto:$QUARTO_VERSION

RUN apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    apt-transport-https \
    curl \
    && \
    rm -rf /var/lib/apt/lists/*

ARG PYTHON_VERSION=3.11.3
ARG OS_IDENTIFIER=ubuntu-2204

RUN curl -fsSL -O https://cdn.rstudio.com/python/${OS_IDENTIFIER}/pkgs/python-${PYTHON_VERSION}_1_amd64.deb && \
    apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -f -y ./python-${PYTHON_VERSION}_1_amd64.deb && \
    rm python-${PYTHON_VERSION}_1_amd64.deb && \
    rm -rf /var/lib/apt/lists/*

RUN /opt/python/3.11.3/bin/python -m venv /venv && \
    PIP_ROOT_USER_ACTION=ignore /venv/bin/pip install -U pip && \
    PIP_ROOT_USER_ACTION=ignore /venv/bin/pip install jupyter

RUN useradd -m person

COPY index.qmd _quarto.yml /content/
RUN chown -R person:person /content/

# libsqlite3 is needed by IPython/core/history
RUN apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    sqlite3

USER person

WORKDIR /content

RUN mkdir /content/python && ln -s /venv python/env
RUN PATH=/content/python/env/bin:/opt/python/${PYTHON_VERSION}/bin:$PATH QUARTO_PYTHON=/content/python/env/bin/python JUPYTER_PATH=/content/python/env/share/jupyter quarto render

an index.qmd:

---
title: using python
---

```{python}
1+1

and a `_quarto.yml`:

````yaml
project:
  output-dir: output

Expected behavior

Only the index.qmd should be rendered.

Actual behavior

Files within the python/env virtual environment are identified and rendered. The python/env virtual environment is referenced by a symbolic link to the actual location of the virtual environment and that location is (intentionally) not writable to the running user.

Building an image:

docker build .

produces the error:

[2/6] python/env/lib/python3.11/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.md
24.90 ERROR: PermissionDenied: Permission denied (os error 13): mkdir '/content/python/env/lib/python3.11/site-packages/pyzmq-25.1.2.dist-info/AUTHORS_files/mediabag'
24.90
24.90 Stack trace:
24.90     at Object.mkdirSync (ext:deno_fs/30_fs.js:132:7)
24.90     at ensureDirSync (file:///opt/quarto/bin/quarto.js:9286:14)
24.90     at renderPandoc (file:///opt/quarto/bin/quarto.js:69965:5)
24.90     at Object.onRender (file:///opt/quarto/bin/quarto.js:77197:42)
24.90     at renderFileInternal (file:///opt/quarto/bin/quarto.js:77174:38)
24.90     at eventLoopTick (ext:core/01_core.js:183:11)
24.90     at async renderFiles (file:///opt/quarto/bin/quarto.js:76998:17)
24.90     at async renderProject (file:///opt/quarto/bin/quarto.js:77325:25)
24.90     at async Command.fn (file:///opt/quarto/bin/quarto.js:81890:32)
24.90     at async Command.execute (file:///opt/quarto/bin/quarto.js:8102:13)

Your environment

No response

Quarto check output

No response

aronatkins commented 6 months ago

This issue happens with 1.4.537 but not with 1.4.533.

cderv commented 6 months ago

Playing with the dockerfile, i can say that 1.4.535 is the pre-release breaking the behavior.

My guess would be that this change is the source: https://github.com/quarto-dev/quarto-cli/commit/8579ac76e2fe400f5ef3bf9690c7760401cfcc33

cc @dragonstyle

I'll try to confirm this with a git bisect

cderv commented 6 months ago

Git bisect confirms this is the problem as guessed. @dragonstyle I'll try to see why the change we did do not behave that same. it seems folder may not be ignored as expected

cscheid commented 6 months ago

Since this is the second failed attempt at adding the feature, and it's a pretty obscure one, I suggest we just revert the change and move it to 1.5.

cderv commented 6 months ago

You mean going back to render target exclusion requiring a *.qmd as documented ? https://quarto.org/docs/projects/quarto-projects.html#render-targets

It feels safer.

Though I am still trying to find what is causing this

dragonstyle commented 6 months ago

Since this is the second failed attempt at adding the feature, and it's a pretty obscure one, I suggest we just revert the change and move it to 1.5.

+10000

cderv commented 6 months ago

Some information about understanding the issue with solution from 8579ac7

This is for documentation purposes in order to do the right thing when we'll fix. I dig a bit deep so I need this to remember when we'll tackle it

What is happening with latest fix

So python/env is a symlink. When we walk on dir content https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L635-L660

we do not follow symlink, and so python/env will be the pathRelative relative tested.

And our regex does not catch this env name because not trailing slash - we expect env/ as it is supposed to be walking on filepath inside the directory.

/^(?:[^/]*(?:\/|$)+)*env\/+(?:[^/]*(?:\/|$)+)*$/.test("python/env")
false
/^(?:[^/]*(?:\/|$)+)*env\/+(?:[^/]*(?:\/|$)+)*$/.test("python/env/")
true

So python/env is added in files. It seems includeDirs: false, as option did not prevent this (symlinked) folder to be in the walked list.

When we later do some filtering at https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L690-L701

file will be python/env and this time Deno.statSync(file).isDirectory will be true. Surprisingly Deno.statSync(file).isSymlink is false 🤔 So when it passes through https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L696-L697

The whole directory python/env is scanned and any file matching our engine detection is kept

One solution maybe is to insure that symlink are not included with includeSymlinks: false ? (https://deno.land/std@0.203.0/fs/mod.ts?s=WalkOptions)

This definitely not include python/env in the first walk of project directory. Seems like by not following symlink we probably not expect a symlink folder / file in the list ?

Why it worke in 1.4.534 and earlier version ?

Regarding why it did not happen with previous fix behavior:

python/env is still detect as potential file while walking dir in addDir(). However, within the addFile() function, no engine is detected as this is indeed a directory https://github.com/quarto-dev/quarto-cli/blob/f70bb074f9476cd1c2827d97bc1070a59f5e0a4b/src/project/project-context.ts#L618-L619

Why it worked in 1.3 before the new feature introduction ?

Regarding 1.3, addDir() was not catching python/env when walking the project dir... So no issue at all here.

What is the status with new fix in #8231 ?

So I checked with #8231 revert commits.

for (const walk of walkSync(
        dir,
        {
          includeDirs: false,
          // this was done b/c some directories e.g. renv/packrat and potentially python
          // virtualenvs include symblinks to R or Python libraries that are in turn
          // circular. much safer to not follow symlinks!
          followSymlinks: false,
          skip: [kSkipHidden].concat(
            engineIgnoreDirs().map((ignore) =>
              globToRegExp(join(dir, ignore) + SEP)
            ),
          ),
        },
      )
    ) { console.log(walk)}

This is with 1.3.450

{path: '/home/quarto/content/index.qmd', name: 'index.qmd', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/_quarto.yml', name: '_quarto.yml', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/index.ipynb', name: 'index.ipynb', isFile: true, isDirectory: false, isSymlink: false}

This is with #8231 so with 1.4

{path: '/home/quarto/content/index.qmd', name: 'index.qmd', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/python/env', name: 'env', isFile: false, isDirectory: false, isSymlink: true}
repl:16
{path: '/home/quarto/content/_quarto.yml', name: '_quarto.yml', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/index.ipynb', name: 'index.ipynb', isFile: true, isDirectory: false, isSymlink: false}

Not the {path: '/home/quarto/content/python/env', name: 'env', isFile: false, isDirectory: false, isSymlink: true}

There is something that I believed changed, because our excluding set in the walk did not change I think

So either in Deno, or some other function involved.