pylint-dev / pylint

It's not just a linter that annoys you!
https://pylint.readthedocs.io/en/latest/
GNU General Public License v2.0
5.26k stars 1.12k forks source link

Allow ``\`` to be both a regex escape character and a windows directory delimiter in ``ignore-paths`` option #5398

Open zeloras opened 2 years ago

zeloras commented 2 years ago

Bug description

Good day/morning/evening!

Today i've update pylint to version 2.12.1 and for now that version ignoring ignore-paths=./migrations/..py, in my .pylintrc config. But in version 2.11.1 everything is working correct

Configuration

[MASTER]
ignore-paths=.*/migrations/.*\.py,.*/urls.py
disable=R0903,C0103,E0110,W0511,R0801,E1101
max-line-length=120
max-module-lines=800

[DESIGN]
max-attributes=30
max-parents=30
max-public-methods=50

Command used

pylint --rcfile=.pylintrc apps

Pylint output

************* Module apps.accounts.migrations.0001_initial
apps/accounts/migrations/0001_initial.py:27:0: C0301: Line too long (196/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:28:0: C0301: Line too long (329/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:29:0: C0301: Line too long (165/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:30:0: C0301: Line too long (203/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:38:0: C0301: Line too long (255/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:45:0: C0301: Line too long (266/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:46:0: C0301: Line too long (229/120) (line-too-long)
apps/accounts/migrations/0001_initial.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0001_initial.py:12:0: C0115: Missing class docstring (missing-class-docstring)
apps/accounts/migrations/0001_initial.py:9:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.accounts.migrations.0002_alter_pacmanuser_last_login
apps/accounts/migrations/0002_alter_pacmanuser_last_login.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0002_alter_pacmanuser_last_login.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0003_alter_pacmanuser_username
apps/accounts/migrations/0003_alter_pacmanuser_username.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0003_alter_pacmanuser_username.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0004_auto_20211124_1539
apps/accounts/migrations/0004_auto_20211124_1539.py:20:0: C0301: Line too long (246/120) (line-too-long)
apps/accounts/migrations/0004_auto_20211124_1539.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0004_auto_20211124_1539.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.accounts.migrations.0005_changeuseremailconfirmationcode
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:9:0: C0115: Missing class docstring (missing-class-docstring)
apps/accounts/migrations/0005_changeuseremailconfirmationcode.py:6:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.notifications.migrations.0001_initial
apps/notifications/migrations/0001_initial.py:24:0: C0301: Line too long (385/120) (line-too-long)
apps/notifications/migrations/0001_initial.py:33:0: C0301: Line too long (130/120) (line-too-long)
apps/notifications/migrations/0001_initial.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0001_initial.py:10:0: C0115: Missing class docstring (missing-class-docstring)
apps/notifications/migrations/0001_initial.py:7:0: C0411: standard import "import uuid" should be placed before "from django.conf import settings" (wrong-import-order)
************* Module apps.notifications.migrations.0002_alter_notifications_notification_type
apps/notifications/migrations/0002_alter_notifications_notification_type.py:16:0: C0301: Line too long (518/120) (line-too-long)
apps/notifications/migrations/0002_alter_notifications_notification_type.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0002_alter_notifications_notification_type.py:6:0: C0115: Missing class docstring (missing-class-docstring)
************* Module apps.notifications.migrations.0003_alter_notifications_notification_type
apps/notifications/migrations/0003_alter_notifications_notification_type.py:16:0: C0301: Line too long (648/120) (line-too-long)
apps/notifications/migrations/0003_alter_notifications_notification_type.py:1:0: C0114: Missing module docstring (missing-module-docstring)
apps/notifications/migrations/0003_alter_notifications_notification_type.py:6:0: C0115: Missing class docstring (missing-class-docstring)

Expected behavior


Your code has been rated at 10.00/10

Pylint version

pylint-2.12.1

OS / Environment

Linux d507ea5a5499 5.14.17-301.fc35.x86_64 #1 SMP Mon Nov 8 13:57:43 UTC 2021 x86_64 GNU/Linux

Additional dependencies

No response

DanielNoord commented 2 years ago

@Pierre-Sassoulas This is actually a tricky bug to resolve. https://github.com/PyCQA/pylint/pull/5201 normalises \ to become OS-agnostic. However, \ is also the character used to escape characters in regex. So, .*/migrations/.*.py,.*/urls.py should work but I think users rightly expect \ to be the escape character. Do you know any regex trickery to allow both swapping \ to / for Windows but also allowing escaping characters?

Pierre-Sassoulas commented 2 years ago

What if we "normalize" the regex only if the current OS is actually windows ? I don't like making OS specific code but this feel justified here.

DanielNoord commented 2 years ago

Hmm, then users on windows can't use the \ to escape. They might copy the pylintrc from somebody who wrote it on Linux and then get a different result from the same pylintrc I don't think that works.

Could we add a line to the description of the setting and say that escaping is not allowed since \ is considered a directory break?

Pierre-Sassoulas commented 2 years ago

Could we use glob instead of regex ? I'm not sure it's really equivalent. glob could handle the hard part and we could point users to the glob documentation.

DanielNoord commented 2 years ago

The problem is actually in the validator: https://github.com/PyCQA/pylint/blob/e33bc6377bbbf6135888cdadecdfc3baa8497a24/pylint/config/option.py#L30-L40

pathlib.PureWindowsPath('.*\\ignore\\.*').as_posix()
'.*/ignore/.*'
pathlib.PureWindowsPath('.*\ignore\.*').as_posix()
'.*/ignore/.*'

For some reason there is no difference here. Even \\\\ returns the same. I don't think changing to glob is the right move as then we would invalidate all current configs using ignore-paths. That seems like a large breaking change. What do you think of updating the description and saying that the escape character can't be used as it represent the dir-delimiter on Windows? It is also not really a nice solution, but I don't know a better way forward..

Pierre-Sassoulas commented 2 years ago

I think we should revert the windows specific part until we find a non breaking change way of handling this properly. Most of our user are on Linux according to pypi stats

pylint_os_proportion

DanielNoord commented 2 years ago

Wouldn't it be better to disallow the use of the escape character? How often are you actually using the escape character in a regex for paths anyway? In the regex used in the OP the escape character is unnecessary as .*/migrations/.*.py,.*/urls.py would work as well. The only thing I can think of is using $ in a directory or file name, but I don't think that is that uncommon. Personally I think supporting windows serves a large user base than the user base that actually needs the escape character.

Pierre-Sassoulas commented 2 years ago

supporting windows serves a large user base than the user base that actually needs the escape character

You're right, and on top of that we can now raise a proper configuration parsing error if the regex contain a forbidden character so it's going to be relatively easy to warn users. The real fix still eludes me though. At least if we don't want to do any breaking changes.

DanielNoord commented 2 years ago

You're right, and on top of that we can now raise a proper configuration parsing error if the regex contain a forbidden character so it's going to be relatively easy to warn users.

Not sure that is so easy. Warning on \ doesn't make sense, so we could only warn on $ and ^ I guess? Perhaps we should make the description of the setting very clear and hope people don't run into this too much 😅

The real fix still eludes me though. At least if we don't want to do any breaking changes.

I don't have a good solution as well sadly..

@zeloras With regards to your initial question: we have reached the conclusion that we are going to disallow \ as escape character for the ignore-paths setting as the \ is needed as delimiter in standard windows paths. I'm going to update the description of the setting accordingly. I know this might not fully satisfy your needs, but we need to make a compromise here (you can read the preceding comments to see what the problem is for us).

In your case updating the settings file to include ignore-paths=.*/migrations/.*.py,.*/urls.py should make the setting work like you expect it to.

DanielNoord commented 2 years ago

For future reference:

The setting description has been updated with #5415. This issue is kept to think of a better and permanent solution to allow both \ as escaping character and normalise paths to POSIX and Windows systems as introduced in #5201.

DetachHead commented 2 years ago

a workaround is to use pyproject.toml instead of .pylintrc, and use raw strings (' instead of "):

[tool.pylint.MASTER]
ignore-patterns = '.*\.pyi'
Zeckie commented 1 year ago

Could we add a line to the description of the setting and say that escaping is not allowed since \ is considered a directory break?

I think any option that modifies or limits regex patterns (like replacing all \s) is a bad idea, as that will then confuse or frustrate people who are familiar with regular expressions, and used to being able to use all regex features. \ is useful in this case both for escaping chars that are allowed in filenames bug have special meaning in regex (like ., (, +, and [), but also for character classes (like \d (digit), \w (alphanumeric plus underscore), \S any char except whitespace) and unicode chars (like \N{EM DASH}). There are more features using \, and plenty of examples in https://docs.python.org/3/library/re.html

From just looking at this issue, before reading the documentation, I though that --ignore-paths was intended to be paths (not regex patterns), and --ignore-patterns was the same but using patterns. Even looking at the documentation, it is difficult to find anything about setting the same configuration using command line vs pyproject.toml and .pylintrc. I would have expected that to be under configuration, similar to how it is in the mypy documentation.

I also think these names would be much clearer: --ignore-paths = paths only, not regex patterns --ignore-path-regex = regex patterns matching paths --ignore-filenames = file names only, not paths or patterns --ignore-filename-regex = regex patterns matching filenames

When the string is not a regex pattern, it should be fine to normalize \ to /, but it is not ok to do that for regular expressions.

To work with windows paths, --ignore-path-regex should use the regex exactly as supplied, and test the same regex on both the windows style and the posix style paths. So the file C:\Projects\foo\bar\baz-1.py would be tested against the regex as python strings "C:\\Projects\\foo\\bar\\baz-1.py" (would match pattern like \\[\w-]+\.py) and "C:/Projects/foo/bar/baz-1.py" (would match pattern like /[\w-]+\.py).

Pattern: foo\d{4}-bar.py would match foo0000-bar.py and foo1234-barxpy but not foo\dddd-bar.py. To best match something like that, use foo\\d{4}-bar\.py

It would also be useful to have more examples in the documentation, to avoid ambiguities such as whether basename includes the extension (https://docs.python.org/3/library/os.path.html#os.path.basename) or not (https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FilenameUtils.html#getBaseName-java.lang.String- ), and whether , is part of the filename or separating 2 filenames.

In the regex used in the OP the escape character is unnecessary as .*/migrations/.*.py,.*/urls.py would work as well.

But that would also match apps/notifications/migrations/foopy (even though it contains no .)

DanielNoord commented 1 year ago

If we are going to rename configuration options we might as well also consider using glob instead of regex..

The code that tests this is fairly straightforward and I'm very open to reviewing a PR that "fixes" this or at least improves the current situation. I won't have much time to work on this myself sadly.

jiasli commented 1 year ago

I agree with @Zeckie that limiting regex's escaping functionality is not a good idea.

In your case updating the settings file to include ignore-paths=.*/migrations/.*.py,.*/urls.py should make the setting work like you expect it to.

.*/urls.py is not the same as .*/urls\.py - .*/urls.py matches foo/urls-py too.