Discussion: Using multiple evaluators on the same region

adamtheturtle commented 1 week ago

My use case:

I have a single Python code-block in a rST document
I wish to do multiple things with this code block as part of my test suite
- Run it (I currently do this with Sybil's PythonCodeBlockParser
- Run mypy / ruff / other against it

My understanding of the situation, and what I tried:

What I'm looking for is to run multiple evaluators on the same region
If I try to add two parsers which act on the same region, I get a ValueError from Document.raise_overlap
If I try to create a SybilCollection of an iterable of Sybils which act on the same region, and then use sybil_collection.pytest(), the first Sybil's evaluators are run. For example, in the following code, because sybil_no_op is given first to the SybilCollection.__init__, no exception is raised, but switching the order means that an exception is raised. I was surprised by this.

sybil_no_op = Sybil(
    parsers=[CodeBlockParser(language="python", evaluator=lambda _: None)],
    patterns=["*.rst", "*.py"],
)

def _raise_exception_evaluator(example: Example) -> None:
    raise Exception

sybil_raise = Sybil(
    parsers=[CodeBlockParser(language="python", evaluator=_raise_exception_evaluator)],
    patterns=["*.rst", "*.py"],
)

sybils = SybilCollection([sybil_no_op, sybil_raise])
pytest_collect_file = sybils.pytest()

I solved my problem by creating the following:

class MultiEvaluator:
    def __init__(self, evaluators: Sequence[Evaluator]) -> None:
        """Run multiple evaluators."""
        self._evaluators = evaluators

    def __call__(self, example: Example) -> None:
        """Run all evaluators."""
        for evaluator in self._evaluators:
            evaluator(example)

pytest_collect_file = Sybil(
    parsers=[
        CodeBlockParser(
            language="python",
            evaluator=MultiEvaluator(
                [_raise_exception_evaluator, lambda _: None, PythonEvaluator()]
            ),
        ),
    ],
    patterns=["*.rst", "*.py"],
).pytest()

I expect or hope that some follow-ups / responses might include a subset of:

Advice on how I can better use the library as-is
Make SybilCollection.pytest() in the above example error because there is an overlap
Make SybilCollection.pytest() in the above example work as I expected (raise Exception whichever order the Sybils are given to SybilCollection.__init__)
Allow parsers to take multiple evaluators without MultiEvaluator. If you think that this is the best idea, I'm happy to contribute to it, but I'd like to discuss the interface / backwards compatibility first.

cjw296 commented 1 week ago

I'm not keen to have MultiEvaluator in the core library or have multiple evaluators for a parsed region. Conceptually, a parsed region is tightly coupled to its evaluator; for Python regions the parsed form used to be the compiled object! There's also the issue of how failures would be reported: your implementation above only gives the failure from the first evaluator to fail on each run, which ends up with that nasty experience of fixing one set of problems only to have a bunch more reported on the next run, potentially flip flopping between different types of failure.

Obviously, if it works for you, no reason to stop using it! :-)

The intention for SybilCollection is to cater for different configurations being required for different files rather than for processing a file more than once. However, it's a relatively small change to have it support processing files more than once for different configurations. I think this would give a more robust, simple, solution to the problem you're facing.

You could argue that the prevention of overlapping regions should just be dropped, but this has saved me enough times when parsing has gone wrong that I'm keen to keep it as it currently is, certainly for a single Sybil.

In any case, I'd be interested to see how you wire mypy / ruff / etc into the checking of examples. Am I right in thinking this can't be done for doctest/repl style tests?

adamtheturtle commented 1 week ago

Thank you @cjw296 for the considered response.

The intention for SybilCollection is to cater for different configurations being required for different files rather than for processing a file more than once. However, it's a relatively small change to have it support processing files more than once for different configurations. I think this would give a more robust, simple, solution to the problem you're facing.

I would be happy with any solution, and for now I'm happy with my own MultiEvaluator. I roughly plan to put it on PyPI to share between my projects, along with whatever I get to for mypy / ruff.

I'd be interested to see how you wire mypy / ruff / etc into the checking of examples.

I'm still in the experimentation phase for that, but here is a dump of some of what I have. While I could do a mypy-specific evaluator which uses the mypy Python API, most of the tools I want to use don't have a Python API. In my projects, I'm already using pre-commit to manage linters etc. and so my goal is to get relevant pre-commit checks working against code blocks. Some pre-commit hooks change (e.g. format) code, and so I have that somewhat working, too.

Am I right in thinking this can't be done for doctest/repl style tests?

I haven't tried this, but apparently ruff works on doctest examples: https://docs.astral.sh/ruff/formatter/#docstring-formatting.

Messy sample code:

class ShellCommandEvaluator:
    """Run a shell command on the example file."""

    def __init__(
        self,
        command: Sequence[str | Path],
        env: Mapping[str, str] | None = None,
        tempfile_suffix: str = "",
        *,
        # For some commands, padding is good: e.g. we want to see the error reported on the correct line for `mypy`.
        # For others, padding is bad: e.g. `ruff format` expects the file to be formatted without a bunch of newlines at the start.
        pad_file: bool = True,
    ) -> None:
        """Initialize the evaluator with the shell command, environment variables, tempfile suffix, and padding option."""
        self.command = command
        self.env = env
        self.tempfile_suffix = tempfile_suffix
        self.pad_file = pad_file

    def __call__(self, example: Example) -> None:
        """Run the shell command on the example file."""
        if self.pad_file:
            source = pad(
                example.parsed, example.line + example.parsed.line_offset
            )
        else:
            source = example.parsed

        prefix = Path(example.path).name.replace(".", "_") + "_"
        with tempfile.NamedTemporaryFile(
            prefix=prefix,
            dir=Path(__file__).parent,
            mode="w+",
            delete=True,
            suffix=self.tempfile_suffix,
        ) as f:
            f.write(source)
            f.flush()
            temp_file_path = Path(f.name)

            result = subprocess.run(
                args=[*self.command, temp_file_path],
                capture_output=True,
                check=False,
                text=True,
                env=self.env,
            )

            # TODO: Add an option of whether the command should be able to write to the file.
            # This would give us the ability to e.g. run `ruff format` to change code within
            # docs, like https://pypi.org/project/blacken-docs/.
            write = True
            if write:
                temp_file_content = temp_file_path.read_text(encoding="utf-8")
                existing_file_path = Path(example.path)
                existing_file_content = existing_file_path.read_text(
                    encoding="utf-8"
                )
                modified_content = (
                    existing_file_content[: example.start]
                    + temp_file_content
                    + existing_file_content[example.end :]
                )
                existing_file_path.write_text(
                    modified_content, encoding="utf-8"
                )

        if result.returncode != 0:
            msg = (
                f"Shell command failed:\n"
                f"Command: {self.command}\n"
                f"Output: {result.stdout}\n"
                f"Error: {result.stderr}"
            )

            raise AssertionError(msg)

pytest_collect_file = Sybil(
    parsers=[
        ClearNamespaceParser(),
        DocTestParser(optionflags=ELLIPSIS),
        CodeBlockParser(
            language="python",
            evaluator=MultiEvaluator(
                evaluators=[
                    ShellCommandEvaluator(
                        command=[sys.executable, "-m", "mypy"], # Probably don't do this in the end - just stick to pre-commit
                    ),
                    ShellCommandEvaluator(
                        command=[
                            sys.executable,
                            "-m",
                            "pre_commit",
                            "run",
                            "--files",
                        ],
                        env={**os.environ, "SKIP": "ruff-format-check"}, # Skip ruff formatting as we are padding the file - this doesn't work with ruff format. Therefore, we run ruff format separately with an un-padded file.
                        tempfile_suffix=".py",
                    ),
                    ShellCommandEvaluator(
                        command=[
                            sys.executable,
                            "-m",
                            "ruff",
                            "format",
                            "--diff",
                        ],
                        tempfile_suffix=".py",
                        pad_file=False,
                    ),
                    ModifyExampleEvaluator(
                        evaluator=ShellCommandEvaluator(
                            command=[
                                sys.executable,
                                "-m",
                                "ruff",
                                "format",
                            ],
                            tempfile_suffix=".py",
                            pad_file=False,
                        ),
                    ),
                ]
            ),
        ),
    ],
    patterns=["*.rst", "*.py"],
    fixtures=["make_image_file", "mock_vws"],
).pytest()

cjw296 commented 1 week ago

Some observations from the above:

Just to check you're aware that the CodeBlockParser above isn't actually executing any of the python you're linting, only the doctest/repr examples will actually be executed by the looks of it.
Conversely, the doctest blocks are not going to have any linting done.
Pragmatically, your ShellCommandEvaluator reads as if it should be RunPythonBinary, and fold the sys.executable and template_suffix bits into that class.
MultiEvaluator reads fine above, but it appears very dependent on all the evaluators being tailored to that parser; for example, that DocTestParser produces regions with DocTestExample parsed objects, rather than the text that CodeBlockParser emits. As an important implementation detail: it should run all of the evaluators, even if earlier ones fail, feeding back the group of evaluator failures is tricky.

I'm going to sort out the SybilCollection stuff for the next release, since I think that's probably a simpler more robust thing.

adamtheturtle commented 1 week ago

Just to check you're aware that the CodeBlockParser above isn't actually executing any of the python you're linting, only the doctest/repr examples will actually be executed by the looks of it.

Yes, I can use the existing sybil.evaluator.python.PythonEvaluator for that.

Conversely, the doctest blocks are not going to have any linting done.

That's right - I'm going to look at DocTestStringParser next.

Pragmatically, your ShellCommandEvaluator reads as if it should be RunPythonBinary, and fold the sys.executable and template_suffix bits into that class.

Potentially, but I think I'd like to keep flexibility for other tools, e.g. shellcheck for shell code blocks. RunPythonBinary might be a wrapper around ShellCommandEvaluator.

I'm going to sort out the SybilCollection stuff for the next release, since I think that's probably a simpler more robust thing.

Great - then I won't worry about perfecting MultiEvaluator, on the assumption that I can throw that away.

cjw296 commented 1 week ago

Heh, looks like this was just a bug (even documented with a test!) in the pytest integration. Fixed in 35f27501e4ebf14a5a3ee1e7f643552d596ade4e.

adamtheturtle commented 1 week ago

Nice work, thank you!

adamtheturtle commented 1 week ago

If you wish to follow-up, documentation I would find useful (and certainly would have found very useful before investigation):

A brief description of the concepts in Sybil (Parser, Lexer, Document, Region, Evaluator) and how they fit together
Why overlapping regions are bad

cjw296 commented 5 days ago

Okay, in 185b792671b76ca993aa9ab8f9ab448199d46773, I've simplified the Usage documentation by splitting out Test Runners and Developing Parsers I've also provided a new Concepts glossary. How does that look?

adamtheturtle commented 5 days ago

That's excellent. Thank you!

I feel that this issue is ready to be closed. One thing that I had to spend thinking/time on is going from a Document to Examples. I now know that I do:

sybil = Sybil(parsers=[rest_parser, markdown_parser])
document = sybil.parse(path=file_path)
for example in document:
    ...

What contributed to this:

It wasn't obvious - I guess I expected Document.examples or similar
__iter__ is not in the API documentation (dunder methods are not automatically documented)
It isn't in the description of Document

cjw296 commented 4 days ago

One thing that I had to spend thinking/time on is going from a Document to Examples.

I'm curious where you encountered the need to do this. This is a pretty internal detail only needed if you're writing a new test runner integration or writing tests for a parser, but then there are tools to help with this that mean you still shouldn't need to know this...

Anyway, added Document.examples() in 629f0a386a60a2cc765460677821c993b66c1cbb as I agree it's clearer.

adamtheturtle commented 4 days ago

Anyway, added Document.examples() in https://github.com/simplistix/sybil/commit/629f0a386a60a2cc765460677821c993b66c1cbb as I agree it's clearer.

Nice!

I'm curious where you encountered the need to do this

Very possible I'm using Sybil beyond its remit. Following on from the ShellCommandEvaluator, I have been working on a language-agnostic CLI tool, doccmd.

It currently has the following WIP interface:

# Run mypy against the Python code blocks in README.md and CHANGELOG.rst
$ doccmd --language=python --command="mypy" README.md CHANGELOG.rst

# Run gofmt against the Go code blocks in README.md
# This will modify the README.md file in place
$ doccmd --language=go --command="gofmt -w" README.md

This means that I'm not using one of the existing test runner integrations. The code in question is at https://github.com/adamtheturtle/doccmd/blob/cdd27bf0ceffbdf4058d2b6b2fc1c8c80adedaf3/src/doccmd/__init__.py#L73.

adamtheturtle commented 4 days ago

Also, I have another issue which is related to using multiple evaluators on the same region.

At https://github.com/VWS-Python/vws-test-fixtures I have a project which ships pytest test fixtures.

The README includes this:

I want to check that the code in the README is valid.

The way I do that is I use the capture parser to save the test function, and then an .. invisible-code-block: python to run the test.

The source looks like:

.. Use "code" rather than "code-block" to avoid having this picked up
.. by both the `PythonCodeBlockParser` and the `CaptureParser` from Sybil.
.. Sybil does not recognize `code` as a code block, so it does not pick it up.
.. If they both pick it up, we get an error about overlapping regions.

.. code:: python

   import io

   # A test to be run by pytest
   def test_example(high_quality_image: io.BytesIO) -> None:
       image_file_bytes = high_quality_image.getvalue()
       ...

.. -> test_src

.. invisible-code-block: python

   import pathlib
   import subprocess
   import tempfile

   import pytest

   with tempfile.TemporaryDirectory() as tmp_dir:
       test_file = pathlib.Path(tmp_dir) / 'test_src.py'
       test_file.write_text(test_src)
       subprocess.check_output(["python", "-m", "pytest", test_file, "--basetemp", test_file.parent])

I use PythonCodeBlockParser() and CaptureParser() in the same Sybil. I believe that this is necessary.

As described in the comment in the source, this only happens to work because I figured out that I can use .. code rather than .. code-block. I imagine there might not be a similar opportunity in other markup languages. This also means that I cannot use Sphinx extensions on this code block which apply only to .. code-block.

cjw296 commented 4 days ago

This means that I'm not using one of the existing test runner integrations. Okay, sure, then you're into pretty niché territory 😆

No objection from me Sybil being used in this way, great to see it actually working for that, but yeah, you're well into internals at this point...

cjw296 commented 4 days ago

The README includes this: ...

Missing code?

cjw296 commented 4 days ago

So, if you want to check code such as this is valid:

   # A test to be run by pytest
   def test_example(high_quality_image: io.BytesIO) -> None:
       image_file_bytes = high_quality_image.getvalue()

...you can take a leaf out of Sybil's approach to this, which is best shown in patterns.rst which uses ReST/Sphinx .. literalinclude:: to include examples from real files on disk.

These files are then tested in a very similar way to the one you have above, but this doesn't have to be woven through Sybil:

https://github.com/simplistix/sybil/blob/552233f3a241eec2b425502bc7183bf4d8f25a17/tests/test_docs_examples.py#L47-L67

I think this may work better for you...

adamtheturtle commented 4 days ago

No objection from me Sybil being used in this way, great to see it actually working for that, but yeah, you're well into internals at this point...

It is working great!

Missing code?

🤦 I have updated the comment to include the screenshot I originallyy meant to add.

So, if you want to check code such as this is valid:

Yes, you've correctly understood what I want to do.

...you can take a leaf out of Sybil's approach to this, which is best shown in patterns.rst which uses ReST/Sphinx .. literalinclude:: to include examples from real files on disk.

Unfortunately I can't do that as I'm testing a README which has to render on GitHub and GitHub doesn't work with .. literalinclude.

cjw296 commented 4 days ago

Unfortunately I can't do that as I'm testing a README which has to render on GitHub and GitHub doesn't work with .. literalinclude.

Yes, had the same problem, but I gave up and just don't have that kind of example in the README...

If you want to fight that good fight, more power to you, but I'm afraid you'll end up tying yourself in the quoting knots that you've already encountered, as well as being surprised and saddened the next time GitHub break anything more than trivial ReST, which they do with tedious frequency...

adamtheturtle commented 4 days ago

Fair. How would you feel about contributions from me for the following:

Have sybil.parsers.rest.CodeBlockParser accept .. code:: (tangential, but mentioned in the previous example). If you're not sure, I can create an issue just for that with discussion.
Have an option on a Sybil to allow overlapping regions, and document that with a warning.

?

adamtheturtle commented 4 days ago

Thank you also for engaging with this!

cjw296 commented 4 days ago

Right, just pushed up c27fc3e1b1031149b3a02585b76c5dac38a23901 which provides the pattern of use example for linting and checking the same code. I think that brings everything to a close and ready for a release and close this issue?

cjw296 commented 4 days ago

Have an option on a Sybil to allow overlapping regions, and document that with a warning.

No thanks.

adamtheturtle commented 4 days ago

Right, just pushed up https://github.com/simplistix/sybil/commit/c27fc3e1b1031149b3a02585b76c5dac38a23901 which provides the pattern of use example for linting and checking the same code. I think that brings everything to a close and ready for a release and close this issue?

Sounds great. I agree that this issue has gone far enough. If I hit anything else, I will create another issue / PR.

cjw296 commented 4 days ago

Have sybil.parsers.rest.CodeBlockParser accept .. code:: (tangential, but mentioned in the previous example). If you're not sure, I can create an issue just for that with discussion.

Can you link me to the docs that cover this as being a legit thing? If so, probably just a case of relaxing the existing regex to in the DirectiveLexer in the ReST CodeBlockParser.

(although this sounds like a thing you don't want, as it'll stop your README workaround from working?)

adamtheturtle commented 4 days ago

Can you link me to the docs that cover this as being a legit thing?

https://docutils.sourceforge.io/docs/ref/rst/directives.html#code

Is this Docutils documentation sufficient?

adamtheturtle commented 3 days ago

I'll make an issue to track this.

(although this sounds like a thing you don't want, as it'll stop your README workaround from working?)

That's right, but I can always create a lexer which doesn't include .. code, if I need one.

I'm going to think on how I can best solve the problem, but it shouldn't depend on a hack.

cjw296 commented 3 days ago

That's right, but I can always create a lexer which doesn't include .. code, if I need one.

Probably a clone of sybil.parsers.rest.CodeBlockParser that uses a narrower lexer and the PythonEvaluator, if you take this approach.

cjw296 commented 3 days ago

https://pypi.org/project/sybil/7.0.0/ now out

adamtheturtle commented 3 days ago

Thank you for all the hard work across multiple issues @cjw296 .

simplistix / sybil

Discussion: Using multiple evaluators on the same region #120