Closed orsinium closed 3 years ago
@orsinium can you please run your POC over source code of some our projects? What will it detect?
Good idea. I'll wrap it into a CLI and try on different projects and stdlib.
Code:
import re
import sys
import tokenize
import typing
from pathlib import Path
REX_WORD = re.compile(r'[A-Za-z]+')
# https://stackoverflow.com/a/1176023/8704691
REX1 = re.compile(r'(.)([A-Z][a-z]+)')
REX2 = re.compile(r'([a-z0-9])([A-Z])')
def get_words(text: str) -> typing.Set[str]:
text = REX1.sub(r'\1 \2', text)
text = REX2.sub(r'\1 \2', text).lower()
text = text.replace('_', ' ')
return set(text.split())
def get_redundant_comments(tokens: typing.Iterable[tokenize.TokenInfo]):
comment = None
code_words: typing.Set[str] = set()
for token in tokens:
if token.type == tokenize.NL:
continue
if token.type == tokenize.COMMENT:
if comment is None:
comment = token
else:
comment = None
continue
if comment is None:
continue
if token.start[0] == comment.start[0] + 1:
code_words.update(get_words(token.string))
continue
comment_words = set(REX_WORD.findall(comment.string))
if comment_words and not comment_words - code_words:
yield comment
code_words = set()
comment = None
def process_file(path: Path):
with path.open('rb') as stream:
tokens = tokenize.tokenize(stream.readline)
for comment in get_redundant_comments(tokens):
print(f'{path}:{comment.start[0]} {comment.string.strip()}')
def get_paths(path: Path):
if path.is_dir():
for p in path.iterdir():
yield from get_paths(p)
elif path.suffix == '.py':
yield path
if __name__ == '__main__':
for path in sys.argv[1:]:
for p in get_paths(Path(path)):
# process file
process_file(p)
A few results:
../pip/docs/html/conf.py:44 # intersphinx
../pip/tests/unit/test_wheel.py:316 # dist-info
../pip/tests/functional/test_uninstall_user.py:59 # install
../pip/src/pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
../pip/src/pip/_internal/vcs/git.py:307 #: update submodules
../pip/src/pip/_internal/req/req_uninstall.py:545 # develop egg
../pip/src/pip/_internal/cli/autocompletion.py:36 # subcommand
../pip/src/pip/_internal/cli/main_parser.py:69 # --version
../dephell/tests/test_commands/test_inspect_project.py:5 # project
../dephell/dephell/converters/setuppy.py:119 # entrypoints
../dephell/dephell/converters/setuppy.py:169 # links
../dephell/dephell/converters/setuppy.py:173 # authors
../dephell/dephell/converters/setuppy.py:221 # extras
../dephell/dephell/converters/flit.py:57 # entrypoints
../dephell/dephell/converters/flit.py:71 # authors
../dephell/dephell/converters/flit.py:195 # readme
../dephell/dephell/converters/poetry.py:210 # extras
../dephell/dephell/converters/egginfo.py:282 # links
../dephell/dephell/converters/egginfo.py:287 # authors
../dephell/dephell/converters/__init__.py:49 # pip
../dephell/dephell/converters/__init__.py:57 # poetry
../dephell/dephell/repositories/_conda/_git.py:258 # os=os,
../dephell/dephell/cli.py:40 # get command
../dephell/dephell/commands/project_test.py:44 # attach
../dephell/dephell/commands/jail_try.py:59 # install
../dephell/dephell/commands/deps_check.py:48 # not installed
../dephell/dephell/commands/project_build.py:45 # attach
../dephell/dephell/commands/deps_install.py:52 # install
../dephell/dephell/commands/jail_install.py:48 # install
../dephell/dephell/commands/jail_install.py:58 # get entrypoints
../dephell/dephell/commands/deps_convert.py:49 # attach
../dephell/dephell/models/dependency.py:104 # propagate repo
../dephell/dephell/config/defaults.py:34 # venv
../dephell/dephell/config/defaults.py:38 # docker
../dephell/dephell/config/__init__.py:1 # app
../dephell/dephell/controllers/_docker.py:120 # create network
../cpython/Lib/argparse.py:1339 # groups
../cpython/Lib/argparse.py:2514 # usage
../cpython/Lib/argparse.py:2518 # description
../cpython/Lib/argparse.py:2528 # epilog
../cpython/Lib/distutils/command/upload.py:136 # file content digests
../cpython/Lib/distutils/tests/test_install.py:143 # none
../cpython/Lib/hashlib.py:236 # rkey = rkey ^ prev
../cpython/Lib/idlelib/configdialog.py:546 # frame_font.
../cpython/Lib/idlelib/configdialog.py:564 # frame_sample.
../cpython/Lib/idlelib/configdialog.py:584 # frame_font.
../cpython/Lib/idlelib/configdialog.py:593 # frame_sample.
../cpython/Lib/idlelib/configdialog.py:912 # frame_theme.
../cpython/Lib/idlelib/configdialog.py:1453 # frame_custom.
../cpython/Lib/idlelib/configdialog.py:1469 # frame_key_sets.
../cpython/Lib/idlelib/configdialog.py:1497 # frame_target.
../cpython/Lib/idlelib/configdialog.py:2002 # frame_help.
../cpython/Lib/idlelib/configdialog.py:2028 # frame_run.
../cpython/Lib/idlelib/configdialog.py:2033 # frame_win_size.
../cpython/Lib/idlelib/configdialog.py:2040 # frame_cursor_blink.
../cpython/Lib/idlelib/configdialog.py:2044 # frame_autocomplete.
../cpython/Lib/idlelib/configdialog.py:2057 # frame_save.
../cpython/Lib/idlelib/configdialog.py:2062 # frame_format.
../cpython/Lib/idlelib/configdialog.py:2066 # frame_line_numbers_default.
../cpython/Lib/idlelib/configdialog.py:2070 # frame_context.
../cpython/Lib/idlelib/configdialog.py:2075 # frame_auto_squeeze_min_lines
../cpython/Lib/idlelib/parenmatch.py:101 # self.create_tag(indices)
../cpython/Lib/idlelib/textview.py:54 # vertical scrollbar
../cpython/Lib/importlib/_bootstrap.py:511 # __loader__
../cpython/Lib/importlib/_bootstrap.py:539 # __package__
../cpython/Lib/importlib/_bootstrap.py:550 # __path__
../cpython/Lib/inspect.py:2106 # *args
../cpython/Lib/inspect.py:2677 # *args
../cpython/Lib/inspect.py:2707 # **kwargs
../cpython/Lib/lib2to3/tests/data/py2_test_grammar.py:419 # 'pass'
../cpython/Lib/lib2to3/tests/data/py2_test_grammar.py:426 # 'break'
../cpython/Lib/lib2to3/tests/data/py3_test_grammar.py:413 # 'pass'
../cpython/Lib/lib2to3/tests/data/py3_test_grammar.py:420 # 'break'
../cpython/Lib/sre_compile.py:587 # generate overlap table
../cpython/Lib/sre_parse.py:804 # flags
../cpython/Lib/tkinter/test/test_ttk/test_widgets.py:1640 # item tags
../cpython/Lib/unittest/async_case.py:129 # cancel all tasks
../cpython/Lib/unittest/async_case.py:149 # shutdown asyncgens
../cpython/Lib/xml/etree/ElementPath.py:271 # [tag]
Nothing for poetry, nothing for WPS :)
py/_io/terminalwriter.py:99 # green
py/_io/terminalwriter.py:101 # blue
py/_xmlgen.py:133 #self.write(obj)
astroid/brain/brain_builtin_inference.py:315 # dict()
mypy/fastparse2.py:512 # **kwarg
mypy/fastparse.py:685 # **kwarg
setuptools/_distutils/command/upload.py:136 # file content digests
setuptools/archive_util.py:115 # directory
deal/_solver/_eval_contracts.py:47 # eval contract
deal/linter/_extractors/markers.py:79 # import
deal/_imports.py:7 # external
pygments/formatters/rtf.py:77 # escape text
pygments/lexers/javascript.py:981 # function name
pygments/lexers/scdoc.py:37 # comment
pygments/lexers/freefem.py:55 # preprocessor
pygments/lexers/freefem.py:854 # deprecated
pygments/lexers/nix.py:64 # operators
pygments/lexers/lisp.py:2551 # keywords
pygments/lexers/matlab.py:111 # operators:
pygments/lexers/matlab.py:120 # punctuation:
pygments/lexers/matlab.py:619 # punctuation:
pygments/lexers/matlab.py:691 # punctuation:
pygments/lexers/r.py:109 # hex number
pygments/lexers/configs.py:821 # comment
pygments/lexers/grammar_notation.py:91 # comment
pygments/lexers/grammar_notation.py:126 # punctuation
pygments/lexers/chapel.py:83 # -- hex
pygments/lexers/dsls.py:772 # punctuation
pygments/lexers/ml.py:497 # keywords
isort/deprecated/finders.py:133 # virtual env
isort/deprecated/finders.py:150 # conda
pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
pip/_internal/vcs/git.py:303 #: update submodules
pip/_internal/req/req_uninstall.py:548 # develop egg
pip/_internal/cli/autocompletion.py:36 # subcommand
pip/_internal/cli/main_parser.py:69 # --version
mypyc/primitives/list_ops.py:102 # list.pop()
mypyc/primitives/str_ops.py:54 # str.split(...)
mypyc/primitives/dict_ops.py:46 # dict1.update(dict2)
hypothesis/vendor/pretty.py:374 # deferred printer
py/_io/terminalwriter.py:99 # green
py/_io/terminalwriter.py:101 # blue
py/_xmlgen.py:133 #self.write(obj)
smmap/test/test_util.py:51 # extend left
git/objects/submodule/base.py:412 # _clone_repo(cls, repo, url, path, name, **kwargs):
astroid/brain/brain_builtin_inference.py:315 # dict()
mypy/fastparse2.py:512 # **kwarg
mypy/fastparse.py:690 # **kwarg
pylint/checkers/refactoring.py:573 # "if <compare node>"
pylint/checkers/similar.py:338 # reports
pylint/checkers/classes.py:1551 # metaclass
setuptools/_distutils/command/upload.py:136 # file content digests
setuptools/archive_util.py:115 # directory
docutils/writers/latex2e/__init__.py:3130 # title:
docutils/utils/math/latex2mathml.py:301 ## self.children[1:3] = self.children[2:0:-1]
pygments/formatters/rtf.py:77 # escape text
pygments/lexers/javascript.py:981 # function name
pygments/lexers/scdoc.py:37 # comment
pygments/lexers/freefem.py:55 # preprocessor
pygments/lexers/freefem.py:854 # deprecated
pygments/lexers/nix.py:64 # operators
pygments/lexers/lisp.py:2551 # keywords
pygments/lexers/matlab.py:111 # operators:
pygments/lexers/matlab.py:120 # punctuation:
pygments/lexers/matlab.py:619 # punctuation:
pygments/lexers/matlab.py:691 # punctuation:
pygments/lexers/r.py:109 # hex number
pygments/lexers/configs.py:821 # comment
pygments/lexers/grammar_notation.py:91 # comment
pygments/lexers/grammar_notation.py:126 # punctuation
pygments/lexers/chapel.py:83 # -- hex
pygments/lexers/dsls.py:772 # punctuation
pygments/lexers/ml.py:497 # keywords
isort/deprecated/finders.py:133 # virtual env
isort/deprecated/finders.py:150 # conda
yaml/composer.py:134 #node.value[item_key] = item_value
pip/_vendor/html5lib/_inputstream.py:478 # "likely" encoding
pip/_internal/vcs/git.py:307 #: update submodules
pip/_internal/req/req_uninstall.py:545 # develop egg
pip/_internal/cli/autocompletion.py:36 # subcommand
pip/_internal/cli/main_parser.py:69 # --version
mypyc/primitives/list_ops.py:84 # list.pop()
mypyc/primitives/str_ops.py:73 # str.split(...)
mypyc/primitives/dict_ops.py:46 # dict1.update(dict2)
gitdb/pack.py:964 # object header
I made a separate plugin to give the experiment a life: https://github.com/orsinium-labs/flake8-comments I'll see how it performs and how many there are false positives. I'd be more than happy if eventually, it will land into WPS.
BTW, a few results for WPS:
../wemake-python-styleguide/wemake_python_styleguide/visitors/ast/statements.py:134:5: CM001: Redundant comment found
# Useless nodes:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_operators/test_walrus.py:7:1: CM001: Redundant comment found
# Correct:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_operators/test_walrus.py:23:1: CM001: Redundant comment found
# Wrong:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_functions/test_positional_only.py:11:1: CM001: Redundant comment found
# Correct:
^
../wemake-python-styleguide/tests/test_visitors/test_ast/test_functions/test_positional_only.py:24:1: CM001: Redundant comment found
# Wrong:
^
Pretty good result! In tests we use # Correct
comment to define correct templates. So, this is fine.
The only potential problem is in wemake-python-styleguide/wemake_python_styleguide/visitors/ast/statements.py:134:5
Sure, no judgment 🙃 I suppose, there will be a lot of false positives. I'm still happy with how it performs, though. We'll see.
Rule request
Thesis
A redundant comment is a comment that says exactly the same as the code. Since comments are written in natural language, only humans can subjectively reason good enough about how informative it is. However, we can do objective heuristics that can find some of the cases. The heuristic I have in mind: all the words in the comment (or all except one, I find it a good tolerance for handling articles and adjectives) are represented in the line above, ignoring the order (bag of words). A few examples:
Reasoning
Redundant comments don't bring any value and only pollute the source code. Even more, any comment can get outdated and so even deceive the reader. So, reducing comments is a good thing, and any comment should have a good justification to be represented in the code.
Implementation
UPD: see the comments below. I modified the original PoC to skip multiline comments.