wemake-services / flake8-eradicate

Flake8 plugin to find commented out or dead code
https://pypi.org/project/flake8-eradicate/
MIT License
310 stars 12 forks source link

fix performance issue due to search in tokens #210

Closed bagerard closed 3 years ago

bagerard commented 3 years ago

Fixes #176

I looked into this and the issue was in the following block

comment_in_line = any(
    token_type == tokenize.COMMENT
    for token_type, _, _, _, _ in self._tokens)

self._tokens is actually containing the tokens from beginning of the file until the physical line yielded by flake8, this means that self._tokens is getting bigger and bigger every time flake8 invokes this plugin on the next lines, thus searching in self._tokens gets more and more expensive for large files as it's inefficient to search in large list by design (the large file is > 12000 lines so that makes a lot of tokens).

I switch the code to process the whole file instead of work line by line, that way it has to search for the comment token only once.

On the long python file attached in #176, this makes the processing time go from 30 sec to just 2 sec :rocket:

bagerard commented 3 years ago

omg... sorry for that. I'm using pip to install it locally, not poetry so I had to clean those files temporarily... These manually triggered pipeline are really killing productivity, let's hope github finds another solution to address the cryptomining issues

bagerard commented 3 years ago

Alright, now it should be good. I've set it up with poetry locally. I added an additional piece of code to skip the # -*- coding: utf-8 -*- line in file (so that it doesn't waste time processing the file if that is the only tokenize.COMMENT found in the file) and I added a test for that

bagerard commented 3 years ago

Hi, please re-run the pipeline when you have a chance ;)

sobolevn commented 3 years ago

Done!

codecov[bot] commented 3 years ago

Codecov Report

Merging #210 (5f6b875) into master (6a5aab0) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #210   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines           41        42    +1     
  Branches         5         7    +2     
=========================================
+ Hits            41        42    +1     
Impacted Files Coverage Δ
flake8_eradicate.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6a5aab0...5f6b875. Read the comment docs.

bagerard commented 3 years ago

Note that consuming the file_tokens from flake8 made it slightly less trivial/efficient to ignore the file encoding '# -*- coding:' comment so I ended up removing that optimization. It was not a major improvement and anyway the '# -*- coding:' thing is something from the past so it may be better to avoid polluting the code with that (e.g pyupgrade removes them by default https://github.com/asottile/pyupgrade#-coding--comment)

bagerard commented 3 years ago

Would you mind running the pipeline on it ?

bagerard commented 3 years ago

Let me know if there is anything else I should do or if it's fine as it is

sobolevn commented 3 years ago

@bagerard I will return to this PR sometime soon! Please, stay tuned 🙂

sobolevn commented 3 years ago

I finally got the time to work on this, sorry for the long wait!

sobolevn commented 3 years ago

@bagerard thanks a lot for your work! 👍