Open DanielNoord opened 3 years ago
A few notes.
- Should we deprecate / remove the option to disable checks by id and only except the message name?
Would probably make our lives easier in the future. That would be an even larger breaking change though. If we provide easy documentation on how to change to message names we can certainly think about doing this.
As for the prefixes, I would propose this
- 001 - 099 for builtin checkers
- 100 - 199 for builtin extensions
- 200 - 299 reserve?
- rest - plugins
Fine with me. Adding a leading zero or a leading three requires a similar amount of effort of extensions maintainers.
Should we deprecate / remove the option to disable checks by id and only except the message name?
Thought about this point some more. Maybe it's better to leave it as is: recommend to disable by name but except ids, too. Saw this here yesterday in the vscode-python repo: pylint.ts. They have since removed that part, however they did used to disable all and then enable only the errors by id. I think that's a valid enough use case for enable / disable by id not to remove it completely.
The msgid also permit to know the type of message that is emitted too (or filter with enable=C). It's the kind of breaking change that will touch our API and can break tools based on pylint (vscode for example) and a lot of scripts. So it's going to create work, let's not rush it before it's obvious we'll need it. I like anticipation but I don't feel we're close to being unable to create a new msgid or checkers either (are we?).
As for the prefixes, I would propose this
- 001 - 099 for builtin checkers
- 100 - 199 for builtin extensions
- 200 - 299 reserve?
- rest - plugins
I'd like to prepare for this in 2.15
as this is likely one of the largest breaking changes for 3.0
. A lot of the current DeprecationWarnings
are for stuff that is reasonably unused anyway, this change will require changes from every plugin maintainer.
Currently we have:
00xx-49xx
for builtin checkers and extensions.
50xx-99xx
for plugins.
There are two possibilities. Either:
000xx-049xx
for builtin checkers and extensions.
050xx-099xx
for plugins.
100xx-149xx
for builtin checkers and extensions.
150xx-199xx
for plugins.
200xx-249xx
for builtin checkers and extensions.
250xx-299xx
for plugins.
Or:
000xx-299xx
for builtin checkers and extensions.
300xx-999xx
for plugins.
The benefit of the first approach is that we wouldn't clash with current plugins. We could add a leading zero for every id of 4 digits so that status quo would work without builtin checkers every clashing with plugin checks. Although still requiring changes from plugin maintainers to avoid DeprecationWarnings
it would be "more" backwards-compatible.
The second approach is obviously a lot cleaner and straightforward for future users.
What do you guys think?
Same opinion than the previous comment, do we really need that ? What would this change permits us to do that we can't right now ? Personally I would not touch the message id unless there's really no other way. Because it's really impactful for us, but also for everyone else
this change will require changes from every plugin maintainer.
I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?
By comparison with other high impact issues that doing this would divert us from:
Same opinion than the previous comment, do we really need that ? What would this change permits us to do that we can't right now ?
We're starting to run out of prefixes. I think we already use 30+ so we only have less than 20 left for standard checkers.
Considering this change should have a deprecation period of several months (or a year) we should make the change sooner rather than later.
I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?
I'm strongly against this as this breaks the assumption many other linters/tools make as well: prefixes indicate a certain type of message. Either coming from the same plugin, or as in our case, coming from a checker with a specific aim. We shouldn't touch that.
By comparison with other high impact issues that doing this would divert us from:
I'm not sure this is necessarily true. The change itself is rather easy:
Depending on the pattern we choose we don't even need to use old_names
or can just use a simple lookup table to map old 4-digit codes to new 5-digit codes.
Compared to argparse and its 100+ PRs this should be doable in +-5 PRs with some follow-up PRs to actually change our own message ids.
we only have less than 20 left for standard checkers.
It's not that pressing a matter, right ? I'm pretty sure we can start grouping things together if we try. I already identified some in #6870. Plus we added maybe 4/5 checkers in the last two years and pylint never has been that active since it's on github.
I'd rather relax the arbitrary requirement of having the same prefix for each message inside a checker first. It would free a lot of message id with no downside. But I don't think we even need to do that yet, do we ?
I'm strongly against this as this breaks the assumption many other linters/tools make as well: prefixes indicate a certain type of message. Either coming from the same plugin, or as in our case, coming from a checker with a specific aim. We shouldn't touch that.
I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know. Even if some user know... what would be the use of that knowledge ? They could deactivate a checker by deactivating its prefix for example C01
instead of deactivating each of its message ? Isn't deactivating each symbol better ? It does not require the configuration reader to know what message are checked by C01...
(missing-docstring
, bad-docstring-quotes
is easier to understand). Doing it by prefix makes pylint less intuitive to use. Beside for extensions it's possible to load a plugin or not directly so you can activate of deactivate an extension, it's only a thing for standard checkers.
In fact I'd say that's bad design and a minor annoyance. I think we're exposing pylint's internals, solidifying message/checker association and causing problem for ourselves with this requirement. I say that as someone who had to do 20+ merge request to be able to handle multiple old name in a message. This is preventing us from doing refactor that we could do easily otherwise without having to rename messages (See https://github.com/PyCQA/pylint/pull/6870). It's also making some thing more complicated (we had to add the shared
option in https://github.com/PyCQA/pylint/pull/6693, I'm pretty sure it took 10+ hours of collective work that we would not have to do if we relaxed the requirement). Also what if we need to group some check for performance reason and need to move code and messages around ? I'm pretty sure some checks are done multiple time across checkers but we can't exploit this if we can't move messages around.
Now, if relaxing the constraint also prevent us to have to do a major breaking change...
It's not that pressing a matter, right ? I'm pretty sure we can start grouping things together if we try. I already identified some in #6870. Plus we added maybe 4/5 checkers in the last two years and pylint never has been that active since it's on github.
Either way it is something that should likely go in 3.0
and would increase the deprecation period for 3.0
so we should do it sooner rather than later.
I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know.
Well, it is explicitly in the documentation, so they probably do.
Even if some user know... what would be the use of that knowledge ?
It is just gives a quick overview where something belongs to. For example, when you load a new extension with 100+ messages and you're not sure if a message comes from pylint
or from the extension the prefix allows you to easily check this. It's just a way to signal additional information to the user which they can use for various reasons.
We don't know. There might be integrations/plugins that rely on the prefix to determine the severity of a message? For example, you could make a plugin that does not fail CI on any message coming from the 20xx prefix but do have those messages show in the CI output.
In fact I'd say that's bad design and a minor annoyance.
All of those things can be taken care of, especially with the new shared
messages system. If there are ever two checkers responsible for the same message without using shared
I think that is a poorer design than the consistency.
I'm not sure "prefixes [after the first letter] indicate a certain type of message." is something most users even know.
Well, it is explicitly in the documentation, so they probably do.
I'm not sure everyone is reading the whole documentation before using pylint especially as it's in the developer documentation and as you don't need to read it because it's enforced by pylint when loading plugin.
For example, when you load a new extension with 100+ messages and you're not sure if a message comes from pylint or from the extension the prefix allows you to easily check this
When you're adding a new extension isn't it because you already took care of the existing issues ? If you add multiple extensions at the same time without fixing anything first you probably don't care what message come from where (or you would add the extension one by one).
We also have an (undocumented and unenforceable) rule regarding msgid for external checkers that should be from 5000 to 9999 and pylint's checker msgid from 0001 to 4999. It's a lot less restrictive than "same 2 digit prefix inside a checker" and it would be sufficient to cover the use case "is this an external checker ?".
a plugin that does not fail CI on any message coming from the 20xx prefix but do have those messages show in the CI output.
Yes, anything is possible but does it make a lot of sense to do that when there's disable/enable and load-plugin available ? I'm not convinced this is worth the hassle to maintain and even less so if it means we need to do a breaking change as large and impactful as adding a digit to existing msgid. There's probably many parsers that just assume that a pylint message is a letter and 4 digit and will actually break (including in our own code) not even talking about all existing discussion/question/documentation about pylint ever made.
All of those things can be taken care of, especially with the new shared messages system. If there are ever two checkers responsible for the same message without using shared I think that is a poorer design than the consistency.
Yes we did what needed to be done to handle the complexity created by this constraint. It's working and it's already done, so let's merge the shared
change and let's not touch it without a good reason.
we should do it sooner rather than later.
True, unless we can avoid it and unless the impact of the change is small and doable in a minor version without deprecation period. I'd argue that removing the checker prefix is a smaller impact than the removal of the checker priority. For checker priority we removed a "feature". For checker prefix we would just be relaxing a constraint and removing the code enforcing it. It would not break anything. Plugin maintainers are free to keep using consistent prefixes for their checker too. So we could do this any time we want... I.E. when we need it, not before.
Current problem
pylint
only allows messageid
's up to9999
since we are limiting ourselves to four digitid
's. If we are going to limit each 2 digit prefix to an individual checker and keep the 51-99 prefixes for external plugins we might run out of prefixes. Therefore, we should probably increase the max length to be 5 digits.For reference:
Desired solution
We will need to find a good way to resolve the current id's to follow the new scheme.
I think we can still consider the last two digits to be specific to each message while the first three digits can be considered the prefix. If a checker handles 99+ messages it might be time to separate it into a new class for easier maintainability, so limiting each checker to 99 messages should be fine.
Then we need a way to resolve the messages which currently have prefixes 1-50 and 51-99. There are two options:
id
's and append a leading zero when we encounter them orid
's.Option 2 obviously has the benefit that it will be much easier to maintain and I think would only require minimal work from maintainers, they can just do
find all + replace
. However, for users that are disabling based onid
's this might become more problematic. However, I wonder how large the group of users is that is upgrading to semversion breaking changes and is usingid
's for disables. Thedocs
are quite clear about preferring message names for disables.A second thing to discuss is which prefixes we will reserve for external plugins. Again, I see two options: A. 1-500 for
pylint
and 501-999 for plugins B. 1-50 forpylint
and 51-99 for plugins, 100-150 forpylint
and 151-199 for plugins, etc. Here I think the option A might be best, as it is more in line with the current division.Additional context
No response