pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.23k stars 17.79k forks source link

DOC: Fix formatting errors in docstrings #27977

Open datapythonista opened 5 years ago

datapythonista commented 5 years ago

Historically, there was no validation on how docstrings were written. Some conventions were usually followed, but as the project grew, it was more difficult to ensure that all the API documentation pages are consistent, and don't have mistakes.

For the last two years, we've been implementing all sorts of validations to make sure every class, method, function and attribute is correctly documented.

The list of validations can be found here in the script that validates them: https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py#L77

Many of them have already been fixed in all the pages, and they could be added to the CI so they are not reintroduced again. The list of errors currently validated can be seen at the CI script: https://github.com/pandas-dev/pandas/blob/master/ci/code_checks.sh#L267

The list of pending errors making the difference is:

{'ES01': 'No extended summary found',
 'EX01': 'No examples section found',
 'EX02': 'Examples do not pass tests:\n{doctest_log}',
 'EX03': 'flake8 error: {error_code} {error_message}{times_happening}',
 'GL01': 'Docstring text (summary) should start in the line immediately after '
         'the opening quotes (not in the same line, or leaving a blank line in '
         'between)',
 'GL02': 'Closing quotes should be placed in the line after the last text in '
         'the docstring (do not close the quotes in the same line as the text, '
         'or leave a blank line between the last text and the quotes)',
 'GL08': 'The object does not have a docstring',
 'PR01': 'Parameters {missing_params} not documented',
 'PR02': 'Unknown parameters {unknown_params}',
 'PR06': 'Parameter "{param_name}" type should use "{right_type}" instead of '
         '"{wrong_type}"',
 'PR07': 'Parameter "{param_name}" has no description',
 'PR08': 'Parameter "{param_name}" description should start with a capital '
         'letter',
 'PR09': 'Parameter "{param_name}" description should finish with "."',
 'RT02': 'The first line of the Returns section should contain only the type, '
         'unless multiple values are being returned',
 'RT03': 'Return value has no description',
 'SA01': 'See Also section not found',
 'SA02': 'Missing period at end of description for See Also "{reference_name}" '
         'reference',
 'SA03': 'Description should be capitalized for See Also "{reference_name}" '
         'reference',
 'SA04': 'Missing description for See Also "{reference_name}" reference',
 'SS01': 'No summary found (a short summary in a single line should be present '
         'at the beginning of the docstring)',
 'SS02': 'Summary does not start with a capital letter',
 'SS03': 'Summary does not end with a period',
 'SS06': 'Summary should fit in a single line',
 'YD01': 'No Yields section found'}

Some of them makes more sense to work when fixing the content of an object (like adding the description, or objects that simply don't have any documentation).

But some of them are just formatting errors, those are the ones I'd start with:

To find errors for one of them you can use:

./scripts/validate_docstrings.py --errors=EX02

Or for errors that makes sense to address together:

./scripts/validate_docstrings.py --errors=GL01,GL02

This should give the list of errors to fix. We've got a list of steps to follow when fixing a docstring that it can be useful to you at: https://python-sprints.github.io/pandas/dashboard.html

VERY IMPORTANT The main challenge will be not repeating the same work as other sprinters, which is very frustrating, and happened massively at every sprint. My recommendation is BEFORE doing any work, to create an issue for the error code you plan to work on (check that it hasn't already been created). In the error write the list of errors that validate_docstrings.py returns. Then in a comment, take 10 of them, and write that you're going to fix them. Other people can work on a different 10. When opening a PR, reference the issue.

I created an issue for reference: #27976

Good luck!

steveayers124 commented 5 years ago

@datapythonista, thanks so much for your advice. We'd been attempting to eliminate duplication of effort, but needed a better method.

goodship1 commented 4 years ago

Is this still open

TomAugspurger commented 4 years ago

Running ./scripts/validate_docstrings.py --errors=EX02 should say whether there are any remaining @goodship1.

HughKelley commented 4 years ago

Saw this in validate_docstrings.py and thought it was useful to share.

    The errors codes are defined as:
    - First two characters: Section where the error happens:
       * GL: Global (no section, like section ordering errors)
       * SS: Short summary
       * ES: Extended summary
       * PR: Parameters
       * RT: Returns
       * YD: Yields
       * RS: Raises
       * WN: Warns
       * SA: See Also
       * NT: Notes
       * RF: References
       * EX: Examples
    - Last two characters: Numeric error code inside the section
ericmariasis commented 4 years ago

take

willpeppo commented 4 years ago

is there still work to be done on this issue? can i take it if there is ?

ericmariasis commented 4 years ago

Sure take it!

On Thu, May 28, 2020 at 5:02 PM willpeppo notifications@github.com wrote:

is there still work to be done on this issue? can i take it if there is ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/27977#issuecomment-635605976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATCQHZPKABMUPS25NPTTYDRT3GOVANCNFSM4IMQOPIQ .

willpeppo commented 4 years ago

take

maty714 commented 2 years ago

Is there still work that needs to be done on this?