python / cpython

The Python programming language
https://www.python.org/
Other
61.15k stars 29.51k forks source link

`doctest` cannot be used to test markdown files. #116546

Open MusicalNinjaDad opened 4 months ago

MusicalNinjaDad commented 4 months ago

Feature or enhancement

Proposal:

While core python uses reStructuredText, many packages maintain their documentation in markdown.

IDEs such as vscode show docstrings in popups when using a function and render markdown within those popups.

Doctest provides an amazing framework to easily validate the quality of examples given in docstrings and documentation, it is a best-of-breed approach which encourages documentation as code by allowing documentation-testing. (In the opinion of the author - thank you for creating it!)

Unfortunately doctest is currently incompatible with markdown documentation (and docstrings) as it considers the end of a code block, signified by triple-ticks (```) to be part of the expected output. This creates a barrier to developers to implementing best practices as they are not able to test their documentation or docstrings without being tied to a specific form of markup (rst).

Proposal

I have raised a draft pull request implementing one possible, and very simple, approach to providing compatibility with markdown codeblocks.

If you believe that this is a change needing wider discussion, then I will happily open a thread under discuss.python.org. Given the small nature of the change (1 line of library code, 100 lines of test case, 10 lines of documentation), I am awaiting feedback on this issue & draft PR, in case it is considered a simple, non-breaking, adjustment which does not require significant, wider discussion.

Alternatives considered

  1. Creating a dedicated parser.
    • Pros: Could be provided as a module outside the standard library; able to identify start and end of code blocks and use this information in structuring the results.
    • Cons: doctest.testmod() would need to be adjusted to allow for an optional parser keyword argument analog doctest.testfile(); higher barrier to entry for users, need to install a separate package and specifically select a different parser.
  2. Test framework plugins (e.g. pytest-doctest-mkdocstrings).
    • Pros: Can be provided as a module outside the standard library
    • Cons: Requires monkeypatching internals of doctest - prone to future failures; only works when tests are run via a separate test framework
  3. Providing additional markdown parsing within doctest.DocTestParser
    • Pros: Additional structure to doctest results.
    • Cons: Significant additional complexity, goes against the current simplicity and agnostic nature of doctest.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

sobolevn commented 4 months ago

I would not say that Markdown is not supported, it is. I use Markdown and doctests in multiple projects. Here's just one example: https://github.com/wemake-services/inspect313

But, yes. You have to keep in mind that triple backticks are treated as an output. So, you have to add an empty line before the backticks: https://github.com/wemake-services/inspect313/blob/1ed3f77059199aa17db4be0fe713315bda87d101/README.md#L46-L48

MusicalNinjaDad commented 4 months ago

Yes, you can workaround by adding a blank line at the end but that adds an easy source of error, looks ugly in the code and I've run into cases where the block gets rendered with blank line. Sorry, I didn't include this point in the original description - I felt it was already quite long for such a small change.

oscarbenjamin commented 3 months ago

There are also downstream issues in pytest (https://github.com/pytest-dev/pytest/issues/7374) and pytest-doctestplus (https://github.com/scientific-python/pytest-doctestplus/pull/221). This behaviour is inherited by those libraries as well since they depend on the stdlib doctest module for collecting the doctest fragments. The SymPy in-house doctest runner removes these (https://github.com/sympy/sympy/blob/d91b8ad6d36a59a879cc70e5f4b379da5fdd46ce/sympy/testing/runtests.py#L1827-L1831).

It looks like doctest was not designed for markdown files. It doesn't know anything about markdown backticks and just uses a regex starting with PS1 and ending with either PS1 or a blank line: https://github.com/python/cpython/blob/027fa2eccf39ddccdf7b416d16601277a7112054/Lib/doctest.py#L608-L624

MusicalNinjaDad commented 3 months ago

It looks like this is worthy of a discussion, as it's clearly not something that will "just be OK" but may be of use to the community nonetheless. I'll open one in the morning when I have sufficient time at the keyboard. @oscarbenjamin - what do you think of the proposed change to adjust the regex to:

    _EXAMPLE_RE = re.compile(r'''
        # Source consists of a PS1 line followed by zero or more PS2 lines.
        (?P<source>
            (?:^(?P<indent> [ ]*) >>>    .*)    # PS1 line
            (?:\n           [ ]*  \.\.\. .*)*)  # PS2 lines
        \n?
        # Want consists of any non-blank lines that do not start with PS1.
        (?P<want> (?:(?![ ]*$)    # Not a blank line
                     (?![ ]*```)  # Not end of a code block
                     (?![ ]*>>>)  # Not a line starting with PS1
                     .+$\n?       # But any other line
                  )*)
        ''', re.MULTILINE | re.VERBOSE)
oscarbenjamin commented 3 months ago

what do you think of the proposed change to adjust the regex to:

The obvious failure case would be if the test output includes triple backticks. That doesn't seem like a big deal to me but I might be wrong.

I imagine that this is best handled outside of the stdlib in the first instance though. See https://github.com/scientific-python/pytest-doctestplus/issues/247.