mattsb42 / pypi-parker

Helper tooling for parking PyPI namespaces to combat typosquatting.
Apache License 2.0
16 stars 1 forks source link

support combinatoric expansion for names #3

Closed mattsb42 closed 2 years ago

mattsb42 commented 7 years ago

Problem

When package names are made up of multiple parts and there are multiple possible typos for each part, it would be nice to be able to programmatically define patterns to expand into all possible combinations.

For example, if I had a package named flake8-super-awesome, rather than defining:

[names]
flake8superawesome:
flak8-super-awesome:
flak8superawesome:
...

I could define, something like:

[DEFAULT]
name_patterns:
    (flake8|flak8)(|-)(super|supr|spr)(|-)(awesome|awsome|awsum)
except_names:
    flake8-super-awesome

Options

Reverse Regex

One option would be to use something like this to accept regular expressions and automatically expand to all possible matches.

I don't really like this option, as I think it could be too easily abused by malicious actors. One of the difficulties in designing the behavior of this tool has been in not crossing the line from "makes things easier for legitimate use" into "makes it too easy for illegitimate use", and I feel like this option would be crossing that line (yes, someone could do that themselves and just generate a massive park.cfg file...in the words of Rick: "don't think about it").

This would also create potentially dangerous accidental use cases that could cause unintended consequences. For example, if, in the scenario above I instead defined the below, I would get an explosion of over 2.5E12 results.

[DEFAULT]
name_patterns:
    flake8-[a-z]{3,5}-awesome

Limited Regex

This is more like what I showed in the example scenario. If only a limited subset of regular expression syntax is supported, then the potential negative impact can be contained.

I think starting with just support for static strings and "or" groups would be reasonable.

It also might be good to add a limit to the total number of names that can be parked in a single config or pattern. The above example seems fairly tame, but would result in 72 packages. They might all be reasonable, but is this dragnet approach a behavior that should be encouraged?