Closed adamtheturtle closed 3 days ago
I'm not keen to have MultiEvaluator
in the core library or have multiple evaluators for a parsed region. Conceptually, a parsed region is tightly coupled to its evaluator; for Python regions the parsed form used to be the compiled object!
There's also the issue of how failures would be reported: your implementation above only gives the failure from the first evaluator to fail on each run, which ends up with that nasty experience of fixing one set of problems only to have a bunch more reported on the next run, potentially flip flopping between different types of failure.
Obviously, if it works for you, no reason to stop using it! :-)
The intention for SybilCollection
is to cater for different configurations being required for different files rather than for processing a file more than once. However, it's a relatively small change to have it support processing files more than once for different configurations. I think this would give a more robust, simple, solution to the problem you're facing.
You could argue that the prevention of overlapping regions should just be dropped, but this has saved me enough times when parsing has gone wrong that I'm keen to keep it as it currently is, certainly for a single Sybil.
In any case, I'd be interested to see how you wire mypy / ruff / etc into the checking of examples. Am I right in thinking this can't be done for doctest/repl style tests?
Thank you @cjw296 for the considered response.
The intention for SybilCollection is to cater for different configurations being required for different files rather than for processing a file more than once. However, it's a relatively small change to have it support processing files more than once for different configurations. I think this would give a more robust, simple, solution to the problem you're facing.
I would be happy with any solution, and for now I'm happy with my own MultiEvaluator
. I roughly plan to put it on PyPI to share between my projects, along with whatever I get to for mypy
/ ruff
.
I'd be interested to see how you wire mypy / ruff / etc into the checking of examples.
I'm still in the experimentation phase for that, but here is a dump of some of what I have. While I could do a mypy
-specific evaluator which uses the mypy
Python API, most of the tools I want to use don't have a Python API. In my projects, I'm already using pre-commit
to manage linters etc. and so my goal is to get relevant pre-commit
checks working against code blocks. Some pre-commit hooks change (e.g. format) code, and so I have that somewhat working, too.
Am I right in thinking this can't be done for doctest/repl style tests?
I haven't tried this, but apparently ruff
works on doctest examples: https://docs.astral.sh/ruff/formatter/#docstring-formatting.
Messy sample code:
class ShellCommandEvaluator:
"""Run a shell command on the example file."""
def __init__(
self,
command: Sequence[str | Path],
env: Mapping[str, str] | None = None,
tempfile_suffix: str = "",
*,
# For some commands, padding is good: e.g. we want to see the error reported on the correct line for `mypy`.
# For others, padding is bad: e.g. `ruff format` expects the file to be formatted without a bunch of newlines at the start.
pad_file: bool = True,
) -> None:
"""Initialize the evaluator with the shell command, environment variables, tempfile suffix, and padding option."""
self.command = command
self.env = env
self.tempfile_suffix = tempfile_suffix
self.pad_file = pad_file
def __call__(self, example: Example) -> None:
"""Run the shell command on the example file."""
if self.pad_file:
source = pad(
example.parsed, example.line + example.parsed.line_offset
)
else:
source = example.parsed
prefix = Path(example.path).name.replace(".", "_") + "_"
with tempfile.NamedTemporaryFile(
prefix=prefix,
dir=Path(__file__).parent,
mode="w+",
delete=True,
suffix=self.tempfile_suffix,
) as f:
f.write(source)
f.flush()
temp_file_path = Path(f.name)
result = subprocess.run(
args=[*self.command, temp_file_path],
capture_output=True,
check=False,
text=True,
env=self.env,
)
# TODO: Add an option of whether the command should be able to write to the file.
# This would give us the ability to e.g. run `ruff format` to change code within
# docs, like https://pypi.org/project/blacken-docs/.
write = True
if write:
temp_file_content = temp_file_path.read_text(encoding="utf-8")
existing_file_path = Path(example.path)
existing_file_content = existing_file_path.read_text(
encoding="utf-8"
)
modified_content = (
existing_file_content[: example.start]
+ temp_file_content
+ existing_file_content[example.end :]
)
existing_file_path.write_text(
modified_content, encoding="utf-8"
)
if result.returncode != 0:
msg = (
f"Shell command failed:\n"
f"Command: {self.command}\n"
f"Output: {result.stdout}\n"
f"Error: {result.stderr}"
)
raise AssertionError(msg)
pytest_collect_file = Sybil(
parsers=[
ClearNamespaceParser(),
DocTestParser(optionflags=ELLIPSIS),
CodeBlockParser(
language="python",
evaluator=MultiEvaluator(
evaluators=[
ShellCommandEvaluator(
command=[sys.executable, "-m", "mypy"], # Probably don't do this in the end - just stick to pre-commit
),
ShellCommandEvaluator(
command=[
sys.executable,
"-m",
"pre_commit",
"run",
"--files",
],
env={**os.environ, "SKIP": "ruff-format-check"}, # Skip ruff formatting as we are padding the file - this doesn't work with ruff format. Therefore, we run ruff format separately with an un-padded file.
tempfile_suffix=".py",
),
ShellCommandEvaluator(
command=[
sys.executable,
"-m",
"ruff",
"format",
"--diff",
],
tempfile_suffix=".py",
pad_file=False,
),
ModifyExampleEvaluator(
evaluator=ShellCommandEvaluator(
command=[
sys.executable,
"-m",
"ruff",
"format",
],
tempfile_suffix=".py",
pad_file=False,
),
),
]
),
),
],
patterns=["*.rst", "*.py"],
fixtures=["make_image_file", "mock_vws"],
).pytest()
Some observations from the above:
Just to check you're aware that the CodeBlockParser
above isn't actually executing any of the python you're linting, only the doctest/repr examples will actually be executed by the looks of it.
Conversely, the doctest blocks are not going to have any linting done.
Pragmatically, your ShellCommandEvaluator
reads as if it should be RunPythonBinary
, and fold the sys.executable
and template_suffix
bits into that class.
MultiEvaluator
reads fine above, but it appears very dependent on all the evaluators being tailored to that parser; for example, that DocTestParser
produces regions with DocTestExample
parsed objects, rather than the text that CodeBlockParser
emits. As an important implementation detail: it should run all of the evaluators, even if earlier ones fail, feeding back the group of evaluator failures is tricky.
I'm going to sort out the SybilCollection
stuff for the next release, since I think that's probably a simpler more robust thing.
Just to check you're aware that the CodeBlockParser above isn't actually executing any of the python you're linting, only the doctest/repr examples will actually be executed by the looks of it.
Yes, I can use the existing sybil.evaluator.python.PythonEvaluator
for that.
Conversely, the doctest blocks are not going to have any linting done.
That's right - I'm going to look at DocTestStringParser
next.
Pragmatically, your ShellCommandEvaluator reads as if it should be RunPythonBinary, and fold the sys.executable and template_suffix bits into that class.
Potentially, but I think I'd like to keep flexibility for other tools, e.g. shellcheck
for shell code blocks. RunPythonBinary
might be a wrapper around ShellCommandEvaluator
.
I'm going to sort out the SybilCollection stuff for the next release, since I think that's probably a simpler more robust thing.
Great - then I won't worry about perfecting MultiEvaluator
, on the assumption that I can throw that away.
Heh, looks like this was just a bug (even documented with a test!) in the pytest integration. Fixed in 35f27501e4ebf14a5a3ee1e7f643552d596ade4e.
Nice work, thank you!
If you wish to follow-up, documentation I would find useful (and certainly would have found very useful before investigation):
Okay, in 185b792671b76ca993aa9ab8f9ab448199d46773, I've simplified the Usage documentation by splitting out Test Runners and Developing Parsers I've also provided a new Concepts glossary. How does that look?
That's excellent. Thank you!
I feel that this issue is ready to be closed.
One thing that I had to spend thinking/time on is going from a Document
to Example
s. I now know that I do:
sybil = Sybil(parsers=[rest_parser, markdown_parser])
document = sybil.parse(path=file_path)
for example in document:
...
What contributed to this:
Document.examples
or similar__iter__
is not in the API documentation (dunder methods are not automatically documented)Document
One thing that I had to spend thinking/time on is going from a Document to Examples.
I'm curious where you encountered the need to do this. This is a pretty internal detail only needed if you're writing a new test runner integration or writing tests for a parser, but then there are tools to help with this that mean you still shouldn't need to know this...
Anyway, added Document.examples()
in 629f0a386a60a2cc765460677821c993b66c1cbb as I agree it's clearer.
Anyway, added Document.examples() in https://github.com/simplistix/sybil/commit/629f0a386a60a2cc765460677821c993b66c1cbb as I agree it's clearer.
Nice!
I'm curious where you encountered the need to do this
Very possible I'm using Sybil beyond its remit.
Following on from the ShellCommandEvaluator
, I have been working on a language-agnostic CLI tool, doccmd
.
It currently has the following WIP interface:
# Run mypy against the Python code blocks in README.md and CHANGELOG.rst
$ doccmd --language=python --command="mypy" README.md CHANGELOG.rst
# Run gofmt against the Go code blocks in README.md
# This will modify the README.md file in place
$ doccmd --language=go --command="gofmt -w" README.md
This means that I'm not using one of the existing test runner integrations. The code in question is at https://github.com/adamtheturtle/doccmd/blob/cdd27bf0ceffbdf4058d2b6b2fc1c8c80adedaf3/src/doccmd/__init__.py#L73.
Also, I have another issue which is related to using multiple evaluators on the same region.
At https://github.com/VWS-Python/vws-test-fixtures I have a project which ships pytest
test fixtures.
The README includes this:
I want to check that the code in the README is valid.
The way I do that is I use the capture parser to save the test function, and then an .. invisible-code-block: python
to run the test.
The source looks like:
.. Use "code" rather than "code-block" to avoid having this picked up
.. by both the `PythonCodeBlockParser` and the `CaptureParser` from Sybil.
.. Sybil does not recognize `code` as a code block, so it does not pick it up.
.. If they both pick it up, we get an error about overlapping regions.
.. code:: python
import io
# A test to be run by pytest
def test_example(high_quality_image: io.BytesIO) -> None:
image_file_bytes = high_quality_image.getvalue()
...
.. -> test_src
.. invisible-code-block: python
import pathlib
import subprocess
import tempfile
import pytest
with tempfile.TemporaryDirectory() as tmp_dir:
test_file = pathlib.Path(tmp_dir) / 'test_src.py'
test_file.write_text(test_src)
subprocess.check_output(["python", "-m", "pytest", test_file, "--basetemp", test_file.parent])
I use PythonCodeBlockParser()
and CaptureParser()
in the same Sybil. I believe that this is necessary.
As described in the comment in the source, this only happens to work because I figured out that I can use .. code
rather than .. code-block
. I imagine there might not be a similar opportunity in other markup languages. This also means that I cannot use Sphinx extensions on this code block which apply only to .. code-block
.
This means that I'm not using one of the existing test runner integrations. Okay, sure, then you're into pretty niché territory 😆
No objection from me Sybil being used in this way, great to see it actually working for that, but yeah, you're well into internals at this point...
The README includes this: ...
Missing code?
So, if you want to check code such as this is valid:
# A test to be run by pytest
def test_example(high_quality_image: io.BytesIO) -> None:
image_file_bytes = high_quality_image.getvalue()
...you can take a leaf out of Sybil's approach to this, which is best shown in patterns.rst which uses ReST/Sphinx .. literalinclude::
to include examples from real files on disk.
These files are then tested in a very similar way to the one you have above, but this doesn't have to be woven through Sybil:
I think this may work better for you...
No objection from me Sybil being used in this way, great to see it actually working for that, but yeah, you're well into internals at this point...
It is working great!
Missing code?
🤦 I have updated the comment to include the screenshot I originallyy meant to add.
So, if you want to check code such as this is valid:
Yes, you've correctly understood what I want to do.
...you can take a leaf out of Sybil's approach to this, which is best shown in patterns.rst which uses ReST/Sphinx .. literalinclude:: to include examples from real files on disk.
Unfortunately I can't do that as I'm testing a README which has to render on GitHub and GitHub doesn't work with .. literalinclude
.
Unfortunately I can't do that as I'm testing a README which has to render on GitHub and GitHub doesn't work with .. literalinclude.
Yes, had the same problem, but I gave up and just don't have that kind of example in the README...
If you want to fight that good fight, more power to you, but I'm afraid you'll end up tying yourself in the quoting knots that you've already encountered, as well as being surprised and saddened the next time GitHub break anything more than trivial ReST, which they do with tedious frequency...
Fair. How would you feel about contributions from me for the following:
sybil.parsers.rest.CodeBlockParser
accept .. code::
(tangential, but mentioned in the previous example). If you're not sure, I can create an issue just for that with discussion.?
Thank you also for engaging with this!
Right, just pushed up c27fc3e1b1031149b3a02585b76c5dac38a23901 which provides the pattern of use example for linting and checking the same code. I think that brings everything to a close and ready for a release and close this issue?
Have an option on a Sybil to allow overlapping regions, and document that with a warning.
No thanks.
Right, just pushed up https://github.com/simplistix/sybil/commit/c27fc3e1b1031149b3a02585b76c5dac38a23901 which provides the pattern of use example for linting and checking the same code. I think that brings everything to a close and ready for a release and close this issue?
Sounds great. I agree that this issue has gone far enough. If I hit anything else, I will create another issue / PR.
Have sybil.parsers.rest.CodeBlockParser accept .. code:: (tangential, but mentioned in the previous example). If you're not sure, I can create an issue just for that with discussion.
Can you link me to the docs that cover this as being a legit thing? If so, probably just a case of relaxing the existing regex to in the DirectiveLexer
in the ReST CodeBlockParser
.
(although this sounds like a thing you don't want, as it'll stop your README workaround from working?)
Can you link me to the docs that cover this as being a legit thing?
https://docutils.sourceforge.io/docs/ref/rst/directives.html#code
Is this Docutils documentation sufficient?
I'll make an issue to track this.
(although this sounds like a thing you don't want, as it'll stop your README workaround from working?)
That's right, but I can always create a lexer which doesn't include .. code
, if I need one.
I'm going to think on how I can best solve the problem, but it shouldn't depend on a hack.
That's right, but I can always create a lexer which doesn't include .. code, if I need one.
Probably a clone of sybil.parsers.rest.CodeBlockParser
that uses a narrower lexer and the PythonEvaluator
, if you take this approach.
Thank you for all the hard work across multiple issues @cjw296 .
My use case:
code-block
in a rST documentPythonCodeBlockParser
mypy
/ruff
/ other against itMy understanding of the situation, and what I tried:
ValueError
fromDocument.raise_overlap
SybilCollection
of an iterable ofSybil
s which act on the same region, and then usesybil_collection.pytest()
, the firstSybil
's evaluators are run. For example, in the following code, becausesybil_no_op
is given first to theSybilCollection.__init__
, no exception is raised, but switching the order means that an exception is raised. I was surprised by this.I solved my problem by creating the following:
I expect or hope that some follow-ups / responses might include a subset of:
SybilCollection.pytest()
in the above example error because there is an overlapSybilCollection.pytest()
in the above example work as I expected (raise Exception
whichever order the Sybils are given toSybilCollection.__init__
)MultiEvaluator
. If you think that this is the best idea, I'm happy to contribute to it, but I'd like to discuss the interface / backwards compatibility first.