microsoft / knack

Knack - A Python command line interface framework
https://pypi.python.org/pypi/knack
MIT License
347 stars 95 forks source link

Inefficient regex in `extract_full_summary_from_signature` #281

Open jiasli opened 5 months ago

jiasli commented 5 months ago

As pointed out by https://gist.github.com/prodigysml/d07cd482214c80bfb6d3240454d2f679, this regex (introduced by https://github.com/microsoft/knack/commit/430c39e657d8a424ef9b631782fe0e62a6bed203) is inefficient:

https://github.com/microsoft/knack/blob/e0c14114aea5e4416c70a77623e403773aba73a8/knack/introspection.py#L18

As shown in https://regex101.com/, a simple :param r requires 1214 steps to fail.

image

This is because \s+, .+? and \s* all match consecutive spaces, thus can trigger many backtrackings.

A better solution is to replace .+? with \w+ to match the parameter name so that backtrackings can be greatly reduced:

\s*(:param)\s+(\w+)\s*:(.*)

image