jsh9 / pydoclint

A Python docstring linter that checks arguments, returns, yields, and raises sections
https://pypi.org/project/pydoclint/
MIT License
146 stars 15 forks source link

Add error code for docstring styles that deviate from selected style #169

Open alpastor4 opened 1 month ago

alpastor4 commented 1 month ago

If a user is linting sphinx-style docstrings, pydoclint should be able to detect whether docstrings deviating from the selected style, such as google/numpy, are present and raise a violation.

jsh9 commented 1 month ago

Hi @alpastor4 , @kehoecj, @jack-swiney:

If I were to implement this feature, changes would need to be made here: https://github.com/jsh9/pydoclint/blob/27ff76915d548ebc19bcd1e9f802bb7c0386b4a6/pydoclint/utils/doc.py#L26-L35

The logic would be:

  1. Parse the docstring using 3 different parsers
  2. Write some logic to determine what style the docstring actually is
  3. Report violation if there's discrepancy

The difficulty is in Step 2, because the logic isn't trivial to write, especially when the dosctring is written with a mixed style (I don't know how often it happens).

Would you want to try to write some logic yourself? If it is solid, I can consider implement it.

Here's a script that you can use as a basis for experimenting the logic:

# Install first: https://github.com/jsh9/docstring_parser_fork
from docstring_parser.google import GoogleParser
from docstring_parser.numpydoc import NumpydocParser
from docstring_parser.rest import parse as parseSphinx

docstring = """Fetches rows from a Smalltable.

    Retrieves rows pertaining to the given keys from the Table instance
    represented by table_handle.  String keys will be UTF-8 encoded.

    Args:
        table_handle: An open smalltable.Table instance.
        keys: A sequence of strings representing the key of each table
          row to fetch.  String keys will be UTF-8 encoded.
        require_all_keys: If True only rows with values set for all keys will be
          returned.

    Returns:
        A dict mapping keys to the corresponding table row data
        fetched. Each row is represented as a tuple of strings. For
        example:

        {b'Serak': ('Rigel VII', 'Preparer'),
         b'Zim': ('Irk', 'Invader'),
         b'Lrrr': ('Omicron Persei 8', 'Emperor')}

        Returned keys are always bytes.  If a key from the keys argument is
        missing from the dictionary, then that row was not found in the
        table (and require_all_keys must have been False).

    Raises:
        IOError: An error occurred accessing the smalltable.
"""

numpy_parser = NumpydocParser()
parsed_numpy = numpy_parser.parse(docstring)

google_parser = GoogleParser()
parsed_google = google_parser.parse(docstring)

parsed_sphinx = parseSphinx(docstring)

print(parsed_numpy)
print('--------------------')
print(parsed_google)
print('--------------------')
print(parsed_sphinx)

On my computer, the script above produced the following output:

{'short_description': 'Fetches rows from a Smalltable.', 'long_description': "Retrieves rows pertaining to the given keys from the Table instance\nrepresented by table_handle.  String keys will be UTF-8 encoded.\n\nArgs:\n    table_handle: An open smalltable.Table instance.\n    keys: A sequence of strings representing the key of each table\n      row to fetch.  String keys will be UTF-8 encoded.\n    require_all_keys: If True only rows with values set for all keys will be\n      returned.\n\nReturns:\n    A dict mapping keys to the corresponding table row data\n    fetched. Each row is represented as a tuple of strings. For\n    example:\n\n    {b'Serak': ('Rigel VII', 'Preparer'),\n     b'Zim': ('Irk', 'Invader'),\n     b'Lrrr': ('Omicron Persei 8', 'Emperor')}\n\n    Returned keys are always bytes.  If a key from the keys argument is\n    missing from the dictionary, then that row was not found in the\n    table (and require_all_keys must have been False).\n\nRaises:\n    IOError: An error occurred accessing the smalltable.", 'blank_after_short_description': True, 'blank_after_long_description': False, 'meta': [], 'style': <DocstringStyle.NUMPYDOC: 3>}
--------------------
{'short_description': 'Fetches rows from a Smalltable.', 'long_description': 'Retrieves rows pertaining to the given keys from the Table instance\nrepresented by table_handle.  String keys will be UTF-8 encoded.', 'blank_after_short_description': True, 'blank_after_long_description': True, 'meta': [{'args': ['param', 'table_handle'], 'description': 'An open smalltable.Table instance.', 'arg_name': 'table_handle', 'type_name': None, 'is_optional': None, 'default': None}, {'args': ['param', 'keys'], 'description': 'A sequence of strings representing the key of each table\nrow to fetch.  String keys will be UTF-8 encoded.', 'arg_name': 'keys', 'type_name': None, 'is_optional': None, 'default': None}, {'args': ['param', 'require_all_keys'], 'description': 'If True only rows with values set for all keys will be\nreturned.', 'arg_name': 'require_all_keys', 'type_name': None, 'is_optional': None, 'default': None}, {'args': ['returns', 'A dict mapping keys to the corresponding table row data\nfetched. Each row is represented as a tuple of strings. For\nexample'], 'description': "{b'Serak': ('Rigel VII', 'Preparer'),\n b'Zim': ('Irk', 'Invader'),\n b'Lrrr': ('Omicron Persei 8', 'Emperor')}\n\nReturned keys are always bytes.  If a key from the keys argument is\nmissing from the dictionary, then that row was not found in the\ntable (and require_all_keys must have been False).", 'type_name': 'A dict mapping keys to the corresponding table row data\nfetched. Each row is represented as a tuple of strings. For\nexample', 'is_generator': False, 'return_name': None}, {'args': ['raises', 'IOError'], 'description': 'An error occurred accessing the smalltable.', 'type_name': 'IOError'}], 'style': <DocstringStyle.GOOGLE: 2>}
--------------------
{'short_description': 'Fetches rows from a Smalltable.', 'long_description': "Retrieves rows pertaining to the given keys from the Table instance\nrepresented by table_handle.  String keys will be UTF-8 encoded.\n\nArgs:\n    table_handle: An open smalltable.Table instance.\n    keys: A sequence of strings representing the key of each table\n      row to fetch.  String keys will be UTF-8 encoded.\n    require_all_keys: If True only rows with values set for all keys will be\n      returned.\n\nReturns:\n    A dict mapping keys to the corresponding table row data\n    fetched. Each row is represented as a tuple of strings. For\n    example:\n\n    {b'Serak': ('Rigel VII', 'Preparer'),\n     b'Zim': ('Irk', 'Invader'),\n     b'Lrrr': ('Omicron Persei 8', 'Emperor')}\n\n    Returned keys are always bytes.  If a key from the keys argument is\n    missing from the dictionary, then that row was not found in the\n    table (and require_all_keys must have been False).\n\nRaises:\n    IOError: An error occurred accessing the smalltable.", 'blank_after_short_description': True, 'blank_after_long_description': False, 'meta': [], 'style': <DocstringStyle.REST: 1>}
Gabriel-p commented 1 month ago

How about just showing a warning/error when something was not found? For example: function xxx.py: WARNING: No Long description found in docstring?

And if the parser founds nothing (empty or really badly formatted docstring) raise a function xxx.py: Could not parse docstring?