quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.96k stars 327 forks source link

rendering attempts to write within virtual environment #8215

Open aronatkins opened 10 months ago

aronatkins commented 10 months ago

Bug description

The quarto render attempts to render Markdown files contained within a virtual environment directory structure.

Steps to reproduce

Given a Dockerfile:

ARG QUARTO_VERSION=1.4.537
FROM ghcr.io/quarto-dev/quarto:$QUARTO_VERSION

RUN apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    apt-transport-https \
    curl \
    && \
    rm -rf /var/lib/apt/lists/*

ARG PYTHON_VERSION=3.11.3
ARG OS_IDENTIFIER=ubuntu-2204

RUN curl -fsSL -O https://cdn.rstudio.com/python/${OS_IDENTIFIER}/pkgs/python-${PYTHON_VERSION}_1_amd64.deb && \
    apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -f -y ./python-${PYTHON_VERSION}_1_amd64.deb && \
    rm python-${PYTHON_VERSION}_1_amd64.deb && \
    rm -rf /var/lib/apt/lists/*

RUN /opt/python/3.11.3/bin/python -m venv /venv && \
    PIP_ROOT_USER_ACTION=ignore /venv/bin/pip install -U pip && \
    PIP_ROOT_USER_ACTION=ignore /venv/bin/pip install jupyter

RUN useradd -m person

COPY index.qmd _quarto.yml /content/
RUN chown -R person:person /content/

# libsqlite3 is needed by IPython/core/history
RUN apt-get update -qq && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y \
    sqlite3

USER person

WORKDIR /content

RUN mkdir /content/python && ln -s /venv python/env
RUN PATH=/content/python/env/bin:/opt/python/${PYTHON_VERSION}/bin:$PATH QUARTO_PYTHON=/content/python/env/bin/python JUPYTER_PATH=/content/python/env/share/jupyter quarto render

an index.qmd:

---
title: using python
---

```{python}
1+1

and a `_quarto.yml`:

````yaml
project:
  output-dir: output

Expected behavior

Only the index.qmd should be rendered.

Actual behavior

Files within the python/env virtual environment are identified and rendered. The python/env virtual environment is referenced by a symbolic link to the actual location of the virtual environment and that location is (intentionally) not writable to the running user.

Building an image:

docker build .

produces the error:

[2/6] python/env/lib/python3.11/site-packages/pyzmq-25.1.2.dist-info/AUTHORS.md
24.90 ERROR: PermissionDenied: Permission denied (os error 13): mkdir '/content/python/env/lib/python3.11/site-packages/pyzmq-25.1.2.dist-info/AUTHORS_files/mediabag'
24.90
24.90 Stack trace:
24.90     at Object.mkdirSync (ext:deno_fs/30_fs.js:132:7)
24.90     at ensureDirSync (file:///opt/quarto/bin/quarto.js:9286:14)
24.90     at renderPandoc (file:///opt/quarto/bin/quarto.js:69965:5)
24.90     at Object.onRender (file:///opt/quarto/bin/quarto.js:77197:42)
24.90     at renderFileInternal (file:///opt/quarto/bin/quarto.js:77174:38)
24.90     at eventLoopTick (ext:core/01_core.js:183:11)
24.90     at async renderFiles (file:///opt/quarto/bin/quarto.js:76998:17)
24.90     at async renderProject (file:///opt/quarto/bin/quarto.js:77325:25)
24.90     at async Command.fn (file:///opt/quarto/bin/quarto.js:81890:32)
24.90     at async Command.execute (file:///opt/quarto/bin/quarto.js:8102:13)

Your environment

No response

Quarto check output

No response

aronatkins commented 10 months ago

This issue happens with 1.4.537 but not with 1.4.533.

cderv commented 10 months ago

Playing with the dockerfile, i can say that 1.4.535 is the pre-release breaking the behavior.

My guess would be that this change is the source: https://github.com/quarto-dev/quarto-cli/commit/8579ac76e2fe400f5ef3bf9690c7760401cfcc33

cc @dragonstyle

I'll try to confirm this with a git bisect

cderv commented 10 months ago

Git bisect confirms this is the problem as guessed. @dragonstyle I'll try to see why the change we did do not behave that same. it seems folder may not be ignored as expected

cscheid commented 10 months ago

Since this is the second failed attempt at adding the feature, and it's a pretty obscure one, I suggest we just revert the change and move it to 1.5.

cderv commented 10 months ago

You mean going back to render target exclusion requiring a *.qmd as documented ? https://quarto.org/docs/projects/quarto-projects.html#render-targets

It feels safer.

Though I am still trying to find what is causing this

dragonstyle commented 10 months ago

Since this is the second failed attempt at adding the feature, and it's a pretty obscure one, I suggest we just revert the change and move it to 1.5.

+10000

cderv commented 10 months ago

Some information about understanding the issue with solution from 8579ac7

This is for documentation purposes in order to do the right thing when we'll fix. I dig a bit deep so I need this to remember when we'll tackle it

What is happening with latest fix

So python/env is a symlink. When we walk on dir content https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L635-L660

we do not follow symlink, and so python/env will be the pathRelative relative tested.

And our regex does not catch this env name because not trailing slash - we expect env/ as it is supposed to be walking on filepath inside the directory.

/^(?:[^/]*(?:\/|$)+)*env\/+(?:[^/]*(?:\/|$)+)*$/.test("python/env")
false
/^(?:[^/]*(?:\/|$)+)*env\/+(?:[^/]*(?:\/|$)+)*$/.test("python/env/")
true

So python/env is added in files. It seems includeDirs: false, as option did not prevent this (symlinked) folder to be in the walked list.

When we later do some filtering at https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L690-L701

file will be python/env and this time Deno.statSync(file).isDirectory will be true. Surprisingly Deno.statSync(file).isSymlink is false 🤔 So when it passes through https://github.com/quarto-dev/quarto-cli/blob/8579ac76e2fe400f5ef3bf9690c7760401cfcc33/src/project/project-context.ts#L696-L697

The whole directory python/env is scanned and any file matching our engine detection is kept

One solution maybe is to insure that symlink are not included with includeSymlinks: false ? (https://deno.land/std@0.203.0/fs/mod.ts?s=WalkOptions)

This definitely not include python/env in the first walk of project directory. Seems like by not following symlink we probably not expect a symlink folder / file in the list ?

Why it worke in 1.4.534 and earlier version ?

Regarding why it did not happen with previous fix behavior:

python/env is still detect as potential file while walking dir in addDir(). However, within the addFile() function, no engine is detected as this is indeed a directory https://github.com/quarto-dev/quarto-cli/blob/f70bb074f9476cd1c2827d97bc1070a59f5e0a4b/src/project/project-context.ts#L618-L619

Why it worked in 1.3 before the new feature introduction ?

Regarding 1.3, addDir() was not catching python/env when walking the project dir... So no issue at all here.

What is the status with new fix in #8231 ?

So I checked with #8231 revert commits.

for (const walk of walkSync(
        dir,
        {
          includeDirs: false,
          // this was done b/c some directories e.g. renv/packrat and potentially python
          // virtualenvs include symblinks to R or Python libraries that are in turn
          // circular. much safer to not follow symlinks!
          followSymlinks: false,
          skip: [kSkipHidden].concat(
            engineIgnoreDirs().map((ignore) =>
              globToRegExp(join(dir, ignore) + SEP)
            ),
          ),
        },
      )
    ) { console.log(walk)}

This is with 1.3.450

{path: '/home/quarto/content/index.qmd', name: 'index.qmd', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/_quarto.yml', name: '_quarto.yml', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/index.ipynb', name: 'index.ipynb', isFile: true, isDirectory: false, isSymlink: false}

This is with #8231 so with 1.4

{path: '/home/quarto/content/index.qmd', name: 'index.qmd', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/python/env', name: 'env', isFile: false, isDirectory: false, isSymlink: true}
repl:16
{path: '/home/quarto/content/_quarto.yml', name: '_quarto.yml', isFile: true, isDirectory: false, isSymlink: false}
repl:16
{path: '/home/quarto/content/index.ipynb', name: 'index.ipynb', isFile: true, isDirectory: false, isSymlink: false}

Not the {path: '/home/quarto/content/python/env', name: 'env', isFile: false, isDirectory: false, isSymlink: true}

There is something that I believed changed, because our excluding set in the walk did not change I think

So either in Deno, or some other function involved.