securing / DumpsterDiver

Tool to search secrets in various filetypes.
MIT License
977 stars 153 forks source link

Speed up things #21

Closed disconnect3d closed 4 years ago

disconnect3d commented 5 years ago

It's good to review this PR by looking on it commit by commit.

List of changes:

  1. Compile regex once instead of passing it to re.findall - the reality is, re module does cache compiled regexes (see https://stackoverflow.com/questions/12514157/how-does-pythons-regex-pattern-caching-work), so it is not a big speedup, but at least it won't have to check if it is in cache.
  2. Get rid of simple regex for checking for a whitespace in string: re.search(r"\s", string) and just use ch.isspace() (for each character) instead.
  3. Don't multiplicate PASSWORD_COMPLEXITY for each word - this seems to be a hot loop, so it's better to calculate this once.
  4. Delay password complexity check as it is probably more expensive then e.g. a<=len(x)<=b check.
  5. return all([a, b]) creates an unnecessary lists and make a call (function calls are expensive in Python). Not that big deal, but return a and b is just faster.
  6. Get rid of list creation in password_search as it seems it should be fine with generating things instead.
xep624 commented 5 years ago

Uuu nice job @disconnect3d ! I'll sit on this during the upcoming weekend. Appreciate!

disconnect3d commented 5 years ago

FWIW the has_whitespace is bad, we should use re.compile here too: https://pastebin.com/qJjr3S4n

Kudos to GwynbleidD and vesim for pointing this out.