rr- / docstring_parser

Parse Python docstrings in various flavors.
MIT License
221 stars 49 forks source link

Google style docstring and sphinx directives issue #42

Open Andry-Bal opened 3 years ago

Andry-Bal commented 3 years ago

The following code snippet causes everything after Example: to be parsed as long_description. As a consequence, params and raises are lost.

from docstring_parser import parse

if __name__ == '__main__':
    docstring = parse(
        """
        Short description

        Long description spanning multiple lines
        - First line
        - Second line
        - Third line

        Example:

        .. testcode::

            foo = bar
            bar = foo

        Args:
            name: description 1
            priority: description 2
            sender: description 3

        Raises:
            IOError: some error
        """)
    print(docstring.short_description)
    print(docstring.long_description)
    print(docstring.params)
    print(docstring.raises)
rr- commented 3 years ago

Where can I read more about this syntax? The reason it gets parsed as long_description is because the Google parser crashes, thinking Example: is an empty section, and .. testcode:: is a separate unknown section, and docstring_parser chooses the next available parser.

rr- commented 3 years ago

The crash is now fixed in 0.9.1, although the example won't parse properly.

mauvilsa commented 3 years ago

@rr- you can read about this syntax in https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html. @Andry-Bal a question regarding the example. Shouldn't the .. testcode:: block be indented one more level so that it is clear that it is part of the Example: section? I have never done this so don't really know how it should go. Just asking.

Andry-Bal commented 3 years ago

@mauvilsa Frankly, I have not used it either, so I am not sure what is the proper way to indent it. However, it is used this way in e.g. here, so I assume it is a valid usage, but I can be wrong.

mauvilsa commented 3 years ago

Looking at https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings, which I guess is the definition of the google style, the only defined sections are Args:, Returns:, Yields: and Raises:. According to this the example in the code snippet above is just a part of the long description, not a separate section. Thus the indentation would be correct. @rr- can you please clarify about the sections.

rr- commented 3 years ago

I would say it should be indented by four spaces like the sections under Args:, Returns etc. However if it's a popular practice then we should support it in this library by extending the parser to look for text that starts with ...

mauvilsa commented 3 years ago

Looking at the sphinx napoleon extension (which is what is used to handle google style docstrings), I think Example: is considered an independent section, see napoleon.html#docstring-sections. There is also an example in which the content is indented with respect to the section title, see example_google.html#example-google. I also tested the generation of sphinx documentation with an example having an indented code block and it works correctly. Based on this I would say that the correct way would be to have the indentation.

On another note I think that docstring-parser should have the same behavior as napoleon. With the wrong indentation the example content is not lost. Not sure how it works internally but in a rendered html it looks like Example: is not considered a section (since it would be empty) and just becomes part of the long description including the code snippet. I could be wrong. The source code I think is https://github.com/sphinx-contrib/napoleon/blob/master/sphinxcontrib/napoleon/docstring.py. In any case I think the parser should preserve all content somewhere, not just if it starts with something special like ...

pawamoy commented 1 year ago

In Griffe we had similar false-positives in Google-style docstrings: matching sections which weren't. I've fixed that by setting stricter rules for section matching in the parser. A section is only a section if:

See https://mkdocstrings.github.io/griffe/docstrings/#google-style.

For example, Ruff implemented rules D411 and D412 coming from pydocstyle itself.

This is a tricky situation though: sometimes the user did want to write a section, and used incorrect spacing, and sometimes the user did not want to write a section, and it should then be parsed as regular markup. In Griffe we don't warn or error out on incorrect section syntax, we only log a debug message saying "if you wanted a section, here's what's wrong with your syntax".