Allow five digit message id's (and possibly disallow four digit id's)

pylint-dev / pylint

It's not just a linter that annoys you!

https://pylint.readthedocs.io/en/latest/

GNU General Public License v2.0

5.33k stars 1.14k forks source link

Allow five digit message id's (and possibly disallow four digit id's) #5275

Open DanielNoord opened 3 years ago

DanielNoord commented 3 years ago

Current problem

pylint only allows message id's up to 9999 since we are limiting ourselves to four digit id's. If we are going to limit each 2 digit prefix to an individual checker and keep the 51-99 prefixes for external plugins we might run out of prefixes. Therefore, we should probably increase the max length to be 5 digits.

For reference:

Another thing, if each checker gets its own prefix, we are essential limited to 100. That might work for us, but what about plugins? Should we consider 5 digits? For existing checks, we could add a leading 0. We need to keep in mind though that is might also be a breaking change. So I suggest 3.0 for it, too.

Originally posted by @cdce8p in https://github.com/PyCQA/pylint/issues/5255#issuecomment-960897436

Desired solution

We will need to find a good way to resolve the current id's to follow the new scheme.

I think we can still consider the last two digits to be specific to each message while the first three digits can be considered the prefix. If a checker handles 99+ messages it might be time to separate it into a new class for easier maintainability, so limiting each checker to 99 messages should be fine.

Then we need a way to resolve the messages which currently have prefixes 1-50 and 51-99. There are two options:

keep accepting four digit id's and append a leading zero when we encounter them or
force extension maintainers (and ourselves) to add the leading zero and only accept five digit id's.

Option 2 obviously has the benefit that it will be much easier to maintain and I think would only require minimal work from maintainers, they can just do find all + replace. However, for users that are disabling based on id's this might become more problematic. However, I wonder how large the group of users is that is upgrading to semversion breaking changes and is using id's for disables. The docs are quite clear about preferring message names for disables.

A second thing to discuss is which prefixes we will reserve for external plugins. Again, I see two options: A. 1-500 for pylint and 501-999 for plugins B. 1-50 for pylint and 51-99 for plugins, 100-150 for pylint and 151-199 for plugins, etc. Here I think the option A might be best, as it is more in line with the current division.

Additional context

No response

cdce8p commented 3 years ago

A few notes.

Should we deprecate / remove the option to disable checks by id and only except the message name?
As for the prefixes, I would propose this
- 001 - 099 for builtin checkers
- 100 - 199 for builtin extensions
- 200 - 299 reserve?
- rest - plugins

DanielNoord commented 3 years ago

Should we deprecate / remove the option to disable checks by id and only except the message name?

Would probably make our lives easier in the future. That would be an even larger breaking change though. If we provide easy documentation on how to change to message names we can certainly think about doing this.

As for the prefixes, I would propose this

001 - 099 for builtin checkers

100 - 199 for builtin extensions

200 - 299 reserve?

rest - plugins

Fine with me. Adding a leading zero or a leading three requires a similar amount of effort of extensions maintainers.

cdce8p commented 3 years ago

Should we deprecate / remove the option to disable checks by id and only except the message name?

Thought about this point some more. Maybe it's better to leave it as is: recommend to disable by name but except ids, too. Saw this here yesterday in the vscode-python repo: pylint.ts. They have since removed that part, however they did used to disable all and then enable only the errors by id. I think that's a valid enough use case for enable / disable by id not to remove it completely.

Pierre-Sassoulas commented 3 years ago

The msgid also permit to know the type of message that is emitted too (or filter with enable=C). It's the kind of breaking change that will touch our API and can break tools based on pylint (vscode for example) and a lot of scripts. So it's going to create work, let's not rush it before it's obvious we'll need it. I like anticipation but I don't feel we're close to being unable to create a new msgid or checkers either (are we?).

DanielNoord commented 2 years ago

As for the prefixes, I would propose this

001 - 099 for builtin checkers

100 - 199 for builtin extensions

200 - 299 reserve?

rest - plugins

I'd like to prepare for this in 2.15 as this is likely one of the largest breaking changes for 3.0. A lot of the current DeprecationWarnings are for stuff that is reasonably unused anyway, this change will require changes from every plugin maintainer.

Currently we have: 00xx-49xx for builtin checkers and extensions. 50xx-99xx for plugins.

There are two possibilities. Either: 000xx-049xx for builtin checkers and extensions. 050xx-099xx for plugins. 100xx-149xx for builtin checkers and extensions. 150xx-199xx for plugins. 200xx-249xx for builtin checkers and extensions. 250xx-299xx for plugins.

Or: 000xx-299xx for builtin checkers and extensions. 300xx-999xx for plugins.

The benefit of the first approach is that we wouldn't clash with current plugins. We could add a leading zero for every id of 4 digits so that status quo would work without builtin checkers every clashing with plugin checks. Although still requiring changes from plugin maintainers to avoid DeprecationWarnings it would be "more" backwards-compatible. The second approach is obviously a lot cleaner and straightforward for future users.

What do you guys think?

Pierre-Sassoulas commented 2 years ago

Same opinion than the previous comment, do we really need that ? What would this change permits us to do that we can't right now ? Personally I would not touch the message id unless there's really no other way. Because it's really impactful for us, but also for everyone else

As you said this change will require changes from every plugin maintainer.
all tools and scripts that make an assumption about message id format will break
all documentations on stackoverflow or anywhere on the web will be outdated
all users will need to know about this change one way or the other

I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?

By comparison with other high impact issues that doing this would divert us from:

doing #3512 would be breaking changes that users can get behind and would helps with pylint "not being a linter that annoy you".
getting closer to being able to do https://github.com/PyCQA/pylint/issues/5933 would have an enormous impact each time pylint is launched

DanielNoord commented 2 years ago

Same opinion than the previous comment, do we really need that ? What would this change permits us to do that we can't right now ?

We're starting to run out of prefixes. I think we already use 30+ so we only have less than 20 left for standard checkers.

Considering this change should have a deprecation period of several months (or a year) we should make the change sooner rather than later.

I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?

I'm strongly against this as this breaks the assumption many other linters/tools make as well: prefixes indicate a certain type of message. Either coming from the same plugin, or as in our case, coming from a checker with a specific aim. We shouldn't touch that.

By comparison with other high impact issues that doing this would divert us from:

I'm not sure this is necessarily true. The change itself is rather easy:

Add a leading zero for every 4 digit message we encounter
Validate that every message is now 5 digits

Depending on the pattern we choose we don't even need to use old_names or can just use a simple lookup table to map old 4-digit codes to new 5-digit codes. Compared to argparse and its 100+ PRs this should be doable in +-5 PRs with some follow-up PRs to actually change our own message ids.

Pierre-Sassoulas commented 2 years ago

we only have less than 20 left for standard checkers.

It's not that pressing a matter, right ? I'm pretty sure we can start grouping things together if we try. I already identified some in #6870. Plus we added maybe 4/5 checkers in the last two years and pylint never has been that active since it's on github.

I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?

I'm strongly against this as this breaks the assumption many other linters/tools make as well: prefixes indicate a certain type of message. Either coming from the same plugin, or as in our case, coming from a checker with a specific aim. We shouldn't touch that.

I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know. Even if some user know... what would be the use of that knowledge ? They could deactivate a checker by deactivating its prefix for example C01 instead of deactivating each of its message ? Isn't deactivating each symbol better ? It does not require the configuration reader to know what message are checked by C01... (missing-docstring, bad-docstring-quotes is easier to understand). Doing it by prefix makes pylint less intuitive to use. Beside for extensions it's possible to load a plugin or not directly so you can activate of deactivate an extension, it's only a thing for standard checkers.

In fact I'd say that's bad design and a minor annoyance. I think we're exposing pylint's internals, solidifying message/checker association and causing problem for ourselves with this requirement. I say that as someone who had to do 20+ merge request to be able to handle multiple old name in a message. This is preventing us from doing refactor that we could do easily otherwise without having to rename messages (See https://github.com/PyCQA/pylint/pull/6870). It's also making some thing more complicated (we had to add the shared option in https://github.com/PyCQA/pylint/pull/6693, I'm pretty sure it took 10+ hours of collective work that we would not have to do if we relaxed the requirement). Also what if we need to group some check for performance reason and need to move code and messages around ? I'm pretty sure some checks are done multiple time across checkers but we can't exploit this if we can't move messages around.

Now, if relaxing the constraint also prevent us to have to do a major breaking change...

DanielNoord commented 2 years ago

It's not that pressing a matter, right ? I'm pretty sure we can start grouping things together if we try. I already identified some in #6870. Plus we added maybe 4/5 checkers in the last two years and pylint never has been that active since it's on github.

Either way it is something that should likely go in 3.0 and would increase the deprecation period for 3.0 so we should do it sooner rather than later.

I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know.

Well, it is explicitly in the documentation, so they probably do.

Even if some user know... what would be the use of that knowledge ?

It is just gives a quick overview where something belongs to. For example, when you load a new extension with 100+ messages and you're not sure if a message comes from pylint or from the extension the prefix allows you to easily check this. It's just a way to signal additional information to the user which they can use for various reasons.

We don't know. There might be integrations/plugins that rely on the prefix to determine the severity of a message? For example, you could make a plugin that does not fail CI on any message coming from the 20xx prefix but do have those messages show in the CI output.

In fact I'd say that's bad design and a minor annoyance.

All of those things can be taken care of, especially with the new shared messages system. If there are ever two checkers responsible for the same message without using shared I think that is a poorer design than the consistency.

Pierre-Sassoulas commented 2 years ago

I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know.

Well, it is explicitly in the documentation, so they probably do.

I'm not sure everyone is reading the whole documentation before using pylint especially as it's in the developer documentation and as you don't need to read it because it's enforced by pylint when loading plugin.

For example, when you load a new extension with 100+ messages and you're not sure if a message comes from pylint or from the extension the prefix allows you to easily check this

When you're adding a new extension isn't it because you already took care of the existing issues ? If you add multiple extensions at the same time without fixing anything first you probably don't care what message come from where (or you would add the extension one by one).

We also have an (undocumented and unenforceable) rule regarding msgid for external checkers that should be from 5000 to 9999 and pylint's checker msgid from 0001 to 4999. It's a lot less restrictive than "same 2 digit prefix inside a checker" and it would be sufficient to cover the use case "is this an external checker ?".

a plugin that does not fail CI on any message coming from the 20xx prefix but do have those messages show in the CI output.

Yes, anything is possible but does it make a lot of sense to do that when there's disable/enable and load-plugin available ? I'm not convinced this is worth the hassle to maintain and even less so if it means we need to do a breaking change as large and impactful as adding a digit to existing msgid. There's probably many parsers that just assume that a pylint message is a letter and 4 digit and will actually break (including in our own code) not even talking about all existing discussion/question/documentation about pylint ever made.

All of those things can be taken care of, especially with the new shared messages system. If there are ever two checkers responsible for the same message without using shared I think that is a poorer design than the consistency.

Yes we did what needed to be done to handle the complexity created by this constraint. It's working and it's already done, so let's merge the shared change and let's not touch it without a good reason.

we should do it sooner rather than later.

True, unless we can avoid it and unless the impact of the change is small and doable in a minor version without deprecation period. I'd argue that removing the checker prefix is a smaller impact than the removal of the checker priority. For checker priority we removed a "feature". For checker prefix we would just be relaxing a constraint and removing the code enforcing it. It would not break anything. Plugin maintainers are free to keep using consistent prefixes for their checker too. So we could do this any time we want... I.E. when we need it, not before.