wemake-services / wemake-python-styleguide

The strictest and most opinionated python linter ever!
https://wemake-python-styleguide.rtfd.io
MIT License
2.51k stars 378 forks source link

Detect redundant comments #1938

Closed orsinium closed 3 years ago

orsinium commented 3 years ago

Rule request

Thesis

A redundant comment is a comment that says exactly the same as the code. Since comments are written in natural language, only humans can subjectively reason good enough about how informative it is. However, we can do objective heuristics that can find some of the cases. The heuristic I have in mind: all the words in the comment (or all except one, I find it a good tolerance for handling articles and adjectives) are represented in the line above, ignoring the order (bag of words). A few examples:

# create user
create_user()

# create a user
create_user()

# create user
create_user(force=True)

# create the user
User.create()

# init data storage
DataStorage.init()

Reasoning

Redundant comments don't bring any value and only pollute the source code. Even more, any comment can get outdated and so even deceive the reader. So, reducing comments is a good thing, and any comment should have a good justification to be represented in the code.

Implementation

UPD: see the comments below. I modified the original PoC to skip multiline comments.

sobolevn commented 3 years ago

@orsinium can you please run your POC over source code of some our projects? What will it detect?

orsinium commented 3 years ago

Good idea. I'll wrap it into a CLI and try on different projects and stdlib.

orsinium commented 3 years ago

Code:

import re
import sys
import tokenize
import typing
from pathlib import Path

REX_WORD = re.compile(r'[A-Za-z]+')
# https://stackoverflow.com/a/1176023/8704691
REX1 = re.compile(r'(.)([A-Z][a-z]+)')
REX2 = re.compile(r'([a-z0-9])([A-Z])')

def get_words(text: str) -> typing.Set[str]:
    text = REX1.sub(r'\1 \2', text)
    text = REX2.sub(r'\1 \2', text).lower()
    text = text.replace('_', ' ')
    return set(text.split())

def get_redundant_comments(tokens: typing.Iterable[tokenize.TokenInfo]):
    comment = None
    code_words: typing.Set[str] = set()
    for token in tokens:
        if token.type == tokenize.NL:
            continue

        if token.type == tokenize.COMMENT:
            if comment is None:
                comment = token
            else:
                comment = None
            continue
        if comment is None:
            continue

        if token.start[0] == comment.start[0] + 1:
            code_words.update(get_words(token.string))
            continue

        comment_words = set(REX_WORD.findall(comment.string))
        if comment_words and not comment_words - code_words:
            yield comment
        code_words = set()
        comment = None

def process_file(path: Path):
    with path.open('rb') as stream:
        tokens = tokenize.tokenize(stream.readline)
        for comment in get_redundant_comments(tokens):
            print(f'{path}:{comment.start[0]} {comment.string.strip()}')

def get_paths(path: Path):
    if path.is_dir():
        for p in path.iterdir():
            yield from get_paths(p)
    elif path.suffix == '.py':
        yield path

if __name__ == '__main__':
    for path in sys.argv[1:]:
        for p in get_paths(Path(path)):
            # process file
            process_file(p)

A few results:

../pip/docs/html/conf.py:44 # intersphinx
../pip/tests/unit/test_wheel.py:316 # dist-info
../pip/tests/functional/test_uninstall_user.py:59 # install
../pip/src/pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
../pip/src/pip/_internal/vcs/git.py:307 #: update submodules
../pip/src/pip/_internal/req/req_uninstall.py:545 # develop egg
../pip/src/pip/_internal/cli/autocompletion.py:36 # subcommand
../pip/src/pip/_internal/cli/main_parser.py:69 # --version

../dephell/tests/test_commands/test_inspect_project.py:5 # project
../dephell/dephell/converters/setuppy.py:119 # entrypoints
../dephell/dephell/converters/setuppy.py:169 # links
../dephell/dephell/converters/setuppy.py:173 # authors
../dephell/dephell/converters/setuppy.py:221 # extras
../dephell/dephell/converters/flit.py:57 # entrypoints
../dephell/dephell/converters/flit.py:71 # authors
../dephell/dephell/converters/flit.py:195 # readme
../dephell/dephell/converters/poetry.py:210 # extras
../dephell/dephell/converters/egginfo.py:282 # links
../dephell/dephell/converters/egginfo.py:287 # authors
../dephell/dephell/converters/__init__.py:49 # pip
../dephell/dephell/converters/__init__.py:57 # poetry
../dephell/dephell/repositories/_conda/_git.py:258 # os=os,
../dephell/dephell/cli.py:40 # get command
../dephell/dephell/commands/project_test.py:44 # attach
../dephell/dephell/commands/jail_try.py:59 # install
../dephell/dephell/commands/deps_check.py:48 # not installed
../dephell/dephell/commands/project_build.py:45 # attach
../dephell/dephell/commands/deps_install.py:52 # install
../dephell/dephell/commands/jail_install.py:48 # install
../dephell/dephell/commands/jail_install.py:58 # get entrypoints
../dephell/dephell/commands/deps_convert.py:49 # attach
../dephell/dephell/models/dependency.py:104 # propagate repo
../dephell/dephell/config/defaults.py:34 # venv
../dephell/dephell/config/defaults.py:38 # docker
../dephell/dephell/config/__init__.py:1 # app
../dephell/dephell/controllers/_docker.py:120 # create network

../cpython/Lib/argparse.py:1339 # groups
../cpython/Lib/argparse.py:2514 # usage
../cpython/Lib/argparse.py:2518 # description
../cpython/Lib/argparse.py:2528 # epilog
../cpython/Lib/distutils/command/upload.py:136 # file content digests
../cpython/Lib/distutils/tests/test_install.py:143 # none
../cpython/Lib/hashlib.py:236 # rkey = rkey ^ prev
../cpython/Lib/idlelib/configdialog.py:546 # frame_font.
../cpython/Lib/idlelib/configdialog.py:564 # frame_sample.
../cpython/Lib/idlelib/configdialog.py:584 # frame_font.
../cpython/Lib/idlelib/configdialog.py:593 # frame_sample.
../cpython/Lib/idlelib/configdialog.py:912 # frame_theme.
../cpython/Lib/idlelib/configdialog.py:1453 # frame_custom.
../cpython/Lib/idlelib/configdialog.py:1469 # frame_key_sets.
../cpython/Lib/idlelib/configdialog.py:1497 # frame_target.
../cpython/Lib/idlelib/configdialog.py:2002 # frame_help.
../cpython/Lib/idlelib/configdialog.py:2028 # frame_run.
../cpython/Lib/idlelib/configdialog.py:2033 # frame_win_size.
../cpython/Lib/idlelib/configdialog.py:2040 # frame_cursor_blink.
../cpython/Lib/idlelib/configdialog.py:2044 # frame_autocomplete.
../cpython/Lib/idlelib/configdialog.py:2057 # frame_save.
../cpython/Lib/idlelib/configdialog.py:2062 # frame_format.
../cpython/Lib/idlelib/configdialog.py:2066 # frame_line_numbers_default.
../cpython/Lib/idlelib/configdialog.py:2070 # frame_context.
../cpython/Lib/idlelib/configdialog.py:2075 # frame_auto_squeeze_min_lines
../cpython/Lib/idlelib/parenmatch.py:101 # self.create_tag(indices)
../cpython/Lib/idlelib/textview.py:54 # vertical scrollbar
../cpython/Lib/importlib/_bootstrap.py:511 # __loader__
../cpython/Lib/importlib/_bootstrap.py:539 # __package__
../cpython/Lib/importlib/_bootstrap.py:550 # __path__
../cpython/Lib/inspect.py:2106 # *args
../cpython/Lib/inspect.py:2677 # *args
../cpython/Lib/inspect.py:2707 # **kwargs
../cpython/Lib/lib2to3/tests/data/py2_test_grammar.py:419 # 'pass'
../cpython/Lib/lib2to3/tests/data/py2_test_grammar.py:426 # 'break'
../cpython/Lib/lib2to3/tests/data/py3_test_grammar.py:413 # 'pass'
../cpython/Lib/lib2to3/tests/data/py3_test_grammar.py:420 # 'break'
../cpython/Lib/sre_compile.py:587 # generate overlap table
../cpython/Lib/sre_parse.py:804 # flags
../cpython/Lib/tkinter/test/test_ttk/test_widgets.py:1640 # item tags
../cpython/Lib/unittest/async_case.py:129 # cancel all tasks
../cpython/Lib/unittest/async_case.py:149 # shutdown asyncgens
../cpython/Lib/xml/etree/ElementPath.py:271 # [tag]
orsinium commented 3 years ago

Nothing for poetry, nothing for WPS :)

orsinium commented 3 years ago
py/_io/terminalwriter.py:99 # green
py/_io/terminalwriter.py:101 # blue
py/_xmlgen.py:133 #self.write(obj)
astroid/brain/brain_builtin_inference.py:315 # dict()
mypy/fastparse2.py:512 # **kwarg
mypy/fastparse.py:685 # **kwarg
setuptools/_distutils/command/upload.py:136 # file content digests
setuptools/archive_util.py:115 # directory
deal/_solver/_eval_contracts.py:47 # eval contract
deal/linter/_extractors/markers.py:79 # import
deal/_imports.py:7 # external
pygments/formatters/rtf.py:77 # escape text
pygments/lexers/javascript.py:981 # function name
pygments/lexers/scdoc.py:37 # comment
pygments/lexers/freefem.py:55 # preprocessor
pygments/lexers/freefem.py:854 # deprecated
pygments/lexers/nix.py:64 # operators
pygments/lexers/lisp.py:2551 # keywords
pygments/lexers/matlab.py:111 # operators:
pygments/lexers/matlab.py:120 # punctuation:
pygments/lexers/matlab.py:619 # punctuation:
pygments/lexers/matlab.py:691 # punctuation:
pygments/lexers/r.py:109 # hex number
pygments/lexers/configs.py:821 # comment
pygments/lexers/grammar_notation.py:91 # comment
pygments/lexers/grammar_notation.py:126 # punctuation
pygments/lexers/chapel.py:83 # -- hex
pygments/lexers/dsls.py:772 # punctuation
pygments/lexers/ml.py:497 # keywords
isort/deprecated/finders.py:133 # virtual env
isort/deprecated/finders.py:150 # conda
pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
pip/_internal/vcs/git.py:303 #: update submodules
pip/_internal/req/req_uninstall.py:548 # develop egg
pip/_internal/cli/autocompletion.py:36 # subcommand
pip/_internal/cli/main_parser.py:69 # --version
mypyc/primitives/list_ops.py:102 # list.pop()
mypyc/primitives/str_ops.py:54 # str.split(...)
mypyc/primitives/dict_ops.py:46 # dict1.update(dict2)
hypothesis/vendor/pretty.py:374 # deferred printer

py/_io/terminalwriter.py:99 # green
py/_io/terminalwriter.py:101 # blue
py/_xmlgen.py:133 #self.write(obj)
smmap/test/test_util.py:51 # extend left
git/objects/submodule/base.py:412 # _clone_repo(cls, repo, url, path, name, **kwargs):
astroid/brain/brain_builtin_inference.py:315 # dict()
mypy/fastparse2.py:512 # **kwarg
mypy/fastparse.py:690 # **kwarg
pylint/checkers/refactoring.py:573 # "if <compare node>"
pylint/checkers/similar.py:338 # reports
pylint/checkers/classes.py:1551 # metaclass
setuptools/_distutils/command/upload.py:136 # file content digests
setuptools/archive_util.py:115 # directory
docutils/writers/latex2e/__init__.py:3130 # title:
docutils/utils/math/latex2mathml.py:301 ##            self.children[1:3] = self.children[2:0:-1]
pygments/formatters/rtf.py:77 # escape text
pygments/lexers/javascript.py:981 # function name
pygments/lexers/scdoc.py:37 # comment
pygments/lexers/freefem.py:55 # preprocessor
pygments/lexers/freefem.py:854 # deprecated
pygments/lexers/nix.py:64 # operators
pygments/lexers/lisp.py:2551 # keywords
pygments/lexers/matlab.py:111 # operators:
pygments/lexers/matlab.py:120 # punctuation:
pygments/lexers/matlab.py:619 # punctuation:
pygments/lexers/matlab.py:691 # punctuation:
pygments/lexers/r.py:109 # hex number
pygments/lexers/configs.py:821 # comment
pygments/lexers/grammar_notation.py:91 # comment
pygments/lexers/grammar_notation.py:126 # punctuation
pygments/lexers/chapel.py:83 # -- hex
pygments/lexers/dsls.py:772 # punctuation
pygments/lexers/ml.py:497 # keywords
isort/deprecated/finders.py:133 # virtual env
isort/deprecated/finders.py:150 # conda
yaml/composer.py:134 #node.value[item_key] = item_value
pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
pip/_internal/vcs/git.py:307 #: update submodules
pip/_internal/req/req_uninstall.py:545 # develop egg
pip/_internal/cli/autocompletion.py:36 # subcommand
pip/_internal/cli/main_parser.py:69 # --version
mypyc/primitives/list_ops.py:84 # list.pop()
mypyc/primitives/str_ops.py:73 # str.split(...)
mypyc/primitives/dict_ops.py:46 # dict1.update(dict2)
gitdb/pack.py:964 # object header
orsinium commented 3 years ago

I made a separate plugin to give the experiment a life: https://github.com/orsinium-labs/flake8-comments I'll see how it performs and how many there are false positives. I'd be more than happy if eventually, it will land into WPS.

orsinium commented 3 years ago

BTW, a few results for WPS:

../wemake-python-styleguide/wemake_python_styleguide/visitors/ast/statements.py:134:5: CM001: Redundant comment found
    # Useless nodes:
    ^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_operators/test_walrus.py:7:1: CM001: Redundant comment found
# Correct:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_operators/test_walrus.py:23:1: CM001: Redundant comment found
# Wrong:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_functions/test_positional_only.py:11:1: CM001: Redundant comment found
# Correct:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_functions/test_positional_only.py:24:1: CM001: Redundant comment found
# Wrong:
^
sobolevn commented 3 years ago

Pretty good result! In tests we use # Correct comment to define correct templates. So, this is fine. The only potential problem is in wemake-python-styleguide/wemake_python_styleguide/visitors/ast/statements.py:134:5

orsinium commented 3 years ago

Sure, no judgment 🙃 I suppose, there will be a lot of false positives. I'm still happy with how it performs, though. We'll see.