Add different configuration for different sub directories

pylint-bot commented 9 years ago

Originally reported by: Fergal Hainey (BitBucket: FerHai)

I'm using pytest and redefined-outer-name is a false negative in these cases as it's part of pytest's design. I'd like to run pylint over the whole project to flag things automatically before code reviews. I'd like to catch cases of redefined-outer-name outside the tests folder.

I could add comments to the modules but 1. that's a lot of comments and 2. changes will require changing as many comments.

What could be nice is looking up through basedirs of the file being linted for pylintrc files and taking config in closer pylintrcs as higher priority. Or maybe something in the project wide pylintrc to denote some config applies to a directory.

Would this be considered a good idea? I'm open to making a patch if it's considered worthwhile.

Bitbucket: https://bitbucket.org/logilab/pylint/issue/618

pylint-bot commented 9 years ago

Original comment by Florian Bruhin (BitBucket: The-Compiler, GitHub: @The-Compiler?):

Personally, I'd gladly use this feature when #352 is fixed as well :smile:

As I don't use __init__.py files in my tests, I currently need a custom script to run pylint over it, and set some custom options there:

files = []
for dirpath, _dirnames, filenames in os.walk('tests'):
    for fn in filenames:
        if os.path.splitext(fn)[1] == '.py':
            files.append(os.path.join(dirpath, fn))
disabled = [
    'attribute-defined-outside-init',
    'redefined-outer-name',
    'unused-argument',
    'missing-docstring',
    # #511/
    'undefined-variable',
]
no_docstring_rgx = ['^__.*__$', '^setup$']
args = (['--disable={}'.format(','.join(disabled)),
         '--no-docstring-rgx=({})'.format('|'.join(no_docstring_rgx))] +
        sys.argv[1:] + files)
subprocess.call(['pylint'] + args)

pylint-bot commented 9 years ago

Original comment by Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):

Having specific configuration files per subdirectories would be nice. I'll probably use it myself, for instance, for ignoring mock related failures in tests.

The problem is that it probably won't be trivial to do it, since the discovery of the configuration is done before getting the files for analysis and it's baked into the main pylint driver class through an additional dependency (logilab.common): https://bitbucket.org/logilab/pylint/src/736b19e661d3f794a30797fd877bb49569736d57/pylint/lint.py?at=default#lint.py-457 Afterwards, each checker uses this configuration object.

Anyway, if you could spent some time on a patch and make this work, that would be really awesome.

pylint-bot commented 9 years ago

Original comment by Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):

By the way, Florian, someone worked on issue #352 at the sprints, but a PR was never made. :(

pylint-bot commented 9 years ago

Original comment by Fergal Hainey (BitBucket: FerHai):

Sounds like adding directories to the config and making the checkers aware of it would be easier than changing how config is loaded. Unless this is a feature the logilab.common dependency would enjoy too?

I'm not yet familiar with pylint and dependencies internals, but will take a look.

pylint-bot commented 9 years ago

Original comment by Florian Bruhin (BitBucket: The-Compiler, GitHub: @The-Compiler?):

I started some work on ripping out logilab.common from pylint but got stuck on the configuration part. I still think it'd be sensible to rip out that overengineered... thing completely and replace it from scratch. Then this kind of thing could be a lot easier.

pylint-bot commented 9 years ago

Original comment by Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):

Having this feature in logilab.common will not help pylint, since we want to move away from it.

pylint-bot commented 9 years ago

Original comment by Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):

By the way, in the issue #28, Torsten said something about using forests of configuration. That sounds like what we're planning to do here, might be worth pinging him for more details.

eddie-dunn commented 8 years ago

I would like a solution where the project wide pylintrc supports ignoring some Pylint-warnings for certain folders and/or Python modules. Is anybody working on this issue? I could take a crack at it otherwise, but I have no idea where to start.

PCManticore commented 8 years ago

No, no one is working currently on this issue, feel free to take a stab at it. But this issue is about introducing support for configurations per subdirectories, which might not necessarily mean support for disabling messages for particular Python modules (it would though mean that you could disable messages in a subdirectory config). Unfortunately, I have no idea right now how can the messages be disabled for particular modules, I have to think of it (probably we can't use disable for that, so we'd need to add a new option, something on the lines of scoped-disabled=a.py:no-member,b.py:unbalanced-tuple-unpacking or something of sorts).

eddie-dunn commented 8 years ago

You are right that this issue is not necessarily (but possibly) related to what I want to achieve. I created a separate issue for folder-specific disables here: https://github.com/PyCQA/pylint/issues/871

AWhetter commented 8 years ago

I'm in need of this feature so I'm looking to implement it but I need to nail down a few details first. Mainly where should I look for configuration files and in what order? My current thinking is to look for and load the files in the opposite order that they are looked for at the moment. So load /etc/pylintrc first if it exists, then the PYLINTRC environment variable/home, then the current directory/the parent directory/it's parent/etc. So in this example we'd load /etc/pylintrc, then any settings that are in ~/.pylintrc would override those in /etc/pylintrc.

In terms of the command line I'm thinking we allow passing in --rcfile multiple times and it'll read the files in order. So settings in later rc files would override settings in the earlier ones.

In terms of implementation in code, my current plan is to allow a list of config file locations (or a single path) wherever we currently use config_file (eg config.OptionsManagerMixIn and config.find_pylintrc). I know that the config module is up for refactoring but I'm not sure if I'm the right person to do that. I could be convinced though as long as I was given enough guidance on what the end goal would be :wink:

PCManticore commented 8 years ago

Hi @AWhetter

There are two major issues with implementing this feature, as far as I can tell:

deciding how the config files are loaded, where they leave and the exact semantic (overriding, completely new files)
how each configuration object gets into a checker or another and how the checker decides what configuration to use.

In the first case, I think that passing multiple --rcfile won't help that much, since each one of them would probably be tailored for one folder or another, while this feature could be left out to the part that discovers configuration files on disk. Regarding the overriding / replacing semantics: I think that each subsequent file should be a complete pylintrc file altogether. The alternative would be for config files from lower levels to modify the state of the configuration files from upper level, but this means it won't be obvious what is changed, now the user would have to know the same logic that pylint does and apply it in order to figure out what is happening. I think it's cleaner to just use completely new configuration files, that have modified one thing or two instead of "patching" things up. Let me know if you find a convincing case about the alternative, if you care about this behaviour. The lookup of the config files would probably need to be separated a bit after having the files for analysis. This would mean a workflow as this:

if one of the location we need to analyze is a folder
- try to get the pylintrc at the folder's level, if any
- descend into the folder and for each inner location, store a corresponding pylintrc if found or the global one In the end, it will result a tree of config files, as in {'location': 'cfg', 'location.inner': 'cfg1', 'location.inner.inner...': 'cfgX'}

Now this gets into the second part of the issue, which is deciding how to pass each inner object into a checker. So what we know so far is that a checker has a underlying config object and it's not necessarily the duty of the checker to know which config object it should use. What, I think, would be useful to have is this: * when entering into a new checker, we need a new dispatch method, in the same vein as open/close, for picking up the config file. open/close can't be reused, because they are called in different parts of lint scaffolding, so it has to be something new, called before processing the token and the AST checkers (somewhere in lint.py/check_astroid_module). It should be called with the node we're intending to process and in that method, we should decide, by the .root().file of the node, which config file to use. * the new config object, picked at the last step, is set and the checkers can use it without caring how it got there.

As far as I can think of it, this seems to be it. There are some rough edges in this proposal, but hopefully we could flesh them out. Let me know what is your opinion about it.

AWhetter commented 8 years ago

I agree that multiple --rcfile flags isn't a great option so I'm happy to drop that.

I don't like the idea of always needing complete rcfiles. My main motivation for implementing this issue was that say you have a team that lints multiple projects, then the team could have a global config file for linting against the team's standard. But then there may be projects that already have their own conventions and the team would want to be able to override certain aspects of the global configuration for that project. If pylint is in the situation where each project (and potentially subdirectories in those projects) has it's own pylintrc file then a change in the team's standards (say a reduction in line length from 80 to 79 characters) would mean changing many rcfiles across many projects. With per-project configuration files that override the global configuration, a change in the team's linting standards would mean a change in only the global rcfile. I do agree however that changing state of the global configuration on every directory level would get confusing. So I'd propose only looking for the first pylintrc file in a directory path instead of on every level. So each parent directory in the path would only be searched for an rcfile if it's child does not contain an rcfile. That way a global overrides apply only to the current directory and subdirectories below.

As for lookups the tree idea would work for complete pylintrc files but seems less desirable for override configs. We'd have to parse the global config every time we come across an override file, and then do the same again once we leave the overridden directory. I need to actually look at whether or not this is plausible in the pylint code, but I wonder whether the config manager itself could handle passing overrides. It could store a stack of configuration options and push/pop from it as directories are entered and left. For example say a directory x overrides max-line-length to 79 characters and single-line-if-stmt to yes. When we enter the directory we'd do push max-line-length=100, single-line-if-stmt=yes, and the stack would store the current values of these configuration variables. Upon leaving the directory these values would be popped back off of the stack and restored. If a directory is entered with no overrides then a null entry is added to the stack. When popped the null entry does nothing.

PCManticore commented 8 years ago

So you're saying that we could support the following protocol for a configuration:

global configuration somewhere
per project configuration that can patch the settings of the global configuration.
per subdirectory configuration that can patch the settings of the project configuration (which is basically a version of 2.)

So, if I'll call pylint project to the following structure, this will happen:

    /project/.pylintrc applies to
              /a.py
              /b.py

            /subfolder/.pylintrc applies to
                      /c.py
                      /d/
                        /x.py
                        another_subfolder/.pylintrc applies to nothing
                                               /e.py

Inner subdirectories, other than this layout, could potentially have a pylintrc of their own that is included from the project's .pylintrc through an include directive, but that is a separate feature that we can address.

I could live with this functionality, as long as we limiting to the closest subdirectories for the root of the project, otherwise, the understanding of what's happening could get fuzzy and I don't want to make this more complicate than it is. Also, I'm thinking that currently this might introduce some unwanted behaviour for users that are having a .pylintrc in their project and, for some reason, they also have a global .pylintrc, which means the per-project .pylintrc of theirs will suddenly be a patch of the global one. We should enforce this behaviour through a flag, that users can disable if they want to have the older feedback of not patching anything, just overriding the configuration.

Regarding the implementation, I think it would be probably cleaner to have multiple config objects, corresponding to each directory's .pylintrc. One issue with your stack-based approach is that the configuration is processed before calling the checkers, which means that the processing will get entangled in the checking process, which would make things more fuzzy as well (plus you don't know exactly which directory you're processing until you're not passing a new Module node to the checkers, ideally, the configurations should be processed so far).

Let me know if there is anything you'd like to discuss, otherwise we can already start working on this and by it, I mean you. ;-)

AWhetter commented 8 years ago

In your example how will we know what is a project level override and what is a directory level override?

I'm actually drifting more towards looking for pylintrc overrides on every level. Yes it could get complicated to figure out which rules in which pylintrc are applied to a file, but I think that this is a decision that can be left up to the user. It gives the user more flexibilty, but we should make it clear in documentation that this comes at the cost of complexity.

PCManticore commented 8 years ago

Okay, I don't mind having it like that in this case. Also, we should take a look to see what other tools are doing, for instance Rubocop for Ruby and ESLint for javascript. At least in the former's case, I know they have some similar as the plan we're trying to flesh out here, would be interesting to see how they approached this problem.

AWhetter commented 8 years ago

So Rubocop has quite an elegant solution. It will find the first configuration file in the directory chain, but within the config file you can choose to include a configuration from elsewhere via either a relative path, absolute path, or even a remote file. It doesn't seem to have a concept of a global config other than the built in default options. See https://github.com/bbatsov/rubocop#inheriting-from-another-configuration-file-in-the-project.

ESLint searches for configuration files on every level, similar to what I was suggesting here. See https://github.com/bbatsov/rubocop#inheriting-from-another-configuration-file-in-the-project. I now prefer Rubocop's solution though.

I think we can achieve the functionality we are looking for with an "include" option in pylint files (or a similarly named option). So we would continue to search for the nearest pylintrc, and a project subdirectory pylintrc file will have to explicitly include a project level pylintrc file to indicate that it wants to overwrite the settings in that file. This should be much easier for a user to use rather than using the pylintrc in every parent directory because the user is more in control of what overrides what. We can then use the "tree of config files" idea that you suggested here (https://github.com/PyCQA/pylint/issues/618#issuecomment-215988030) to build config objects for each subdirectory and apply each one appropriately.

I don't think that it's a good idea for us to implement remote pylintrc files like Rubocop has because we'd be automatically downloading a python file and evaluating it. So we should stick with absolute and relative paths.

The only downside with the includes solution is that, in the case that I described where you might want a team pylintrc file to apply to a range of projects, each project pylintrc file would need to include the path to the team pylintrc file. If the path to the team file changes then many pylintrc files need to be updated. This is maybe a bit of an unusual scenario, but I think it could be fixed with something like a --global-pylintrc file option that describes the global pylintrc, and pylint will still search for near pylintrc files to override that with still. I think this something to consider in another iteration, and the "includes" solution can be implemented on its own.

PCManticore commented 8 years ago

I think this approach sounds very good, so I'm +1 for doing it as is.

kernc commented 8 years ago

Regarding the overriding / replacing semantics: I think that each subsequent file should be a complete pylintrc file altogether. ... Let me know if you find a convincing case about the alternative, if you care about this behaviour.

We have a project-wide lint config we like. But for tests in the tests dir, we would like to disable line-too-long and missing-docstring.

Ftr, introducing an include=/path/to/other/pylintrc directive is in line with https://github.com/PyCQA/pylint/issues/175.

AWhetter commented 8 years ago

I started the implementation of this but I felt like I was really shoehorning the code into what already there. So instead I need to do some refactoring in the lead up resolving the actual issue:

[x] Move the parallel execution code into Runner (See #936. This is that it will be easier to pull config code of PyLinter).
[ ] Pull command line parsing code either into the Runner or preferably into something that creates the runner (See #972).
[ ] Pull config code out of PyLinter so that it can take a config object instead. (Creating a PyLinter needs to be cheap as it's going to need to be done per file at first. There may need to be some work done to figure out how extensions will register their options and defaults onto the config. It may be that a PyLinter looks at a checker when loading it and manipulates it's config object accordingly. Or it could be that the config object loads the checkers and does the manipulation on itself.)
[ ] Pull config code out of checkers. (Creating a checker should also be cheap. It'll be done every time we make a pylinter.).
[ ] Make the Runner find and distribute config objects to pylinters. (This should be done via some config store object that maps paths to config objects via an internal cache. The runner should cache linters in a similar fashion so that we aren't constantly creating linters for directories that use a previously loaded config. Config files should still be complete pylintrc files at this point.). See #973 for part of the solution for this task.
[ ] Add "Include" directive to config files.

It's worth noting that we can skip steps 2, 3, and 4, but we'll see a slight performance hit because of it. So I'll do that and then post the other steps as issues afterwards.

PCManticore commented 8 years ago

Sounds good overall. One small comment regarding Or it could be that the config object loads the checkers and does the manipulation on itself. This doesn't sound a good idea, since the configuration shouldn't have this capability of loading the checkers, for having complete separation of concerns. Apart of this comment, I like this plan. Please create new issues when implementing them, so it would be easier to track than a checkbox.

PCManticore commented 8 years ago

Hey @AWhetter Let me know if there is a blocking decision to be made regarding the sub-issues for this one (I somehow forgot the state we were back then).

AWhetter commented 8 years ago

I'm going to start work on #972. I think this going to require me figuring out how it's working at the moment and then we can plan a better solution. I have some old notes on this:

Going to need to rework the configuration system because the command line processing and the configuration file processing is too tied together. Maybe the command line is only for global options, plus a flag for changing a config option (so the command does no "configuration option" parsing). So the CLI is parsed, plus it produces a config object, that will be merged with any other config objects to override their options. Because we will be doing configuration file discovery on startup, do we want to initialise all plugins on startup as well, rather than risk a plugin failing during run time. Initialising plugins at run time may get tricky for parallel builds as well.

I also have a comment from you about it:

One thing I would love to have for instance, would be exclusive options at the parser level, which argparse does. Cannot have it currently or I just couldn't find. If you can separate the CLI parsing from the actual configuration parsing and then process these two together, that would be amazing!

We also had a discussion about two phase checkers, so that checkers could be made lighter. This led to this proposal (https://public.etherpad-mozilla.org/p/pylint-two-passes). My only reserve about it was that storing the AST for a whole directly could be costly with memory. I've done some tests on a couple of large projects and it seems like keeping all ASTs in memory isn't that costly after all so I think we should give this idea the green light! If I remember rightly we never came to a conclusion on how exactly we would store the "state" in memory. I think we said it would also mean that we wouldn't need to do any funny business in the reporters to make checkers such as the import cycle checker output it's messages in the correct order, rather than at the end.

PCManticore commented 8 years ago

Do you have the benchmarks at your disposal by any chance? I am curious seeing what storing all the ASTs actually incurs.

From what I recally by skimming through the two passes document, the solution we were discussing was postponing the actual emitting of the messages until we analyzed everything we need, which means a not-so instant feedback to the user, although I think it does not matter that much? Regarding the state, if each checker could store it internally and expose it, that should be fine, no?

Anyway, I won't probably have too much bandwidth these days (the whole October actually) to think about it, so please come with a solution that satisfies you, I will take a look and go with what seems best in terms of perf / memory, since having a solution would be better than not having anything at all.

AWhetter commented 7 years ago

Just tagging in #1554 which has a reasonable use case for wanting instant feedback from pylint. This wouldn't be easy/possible with two stage checking.

christensson commented 7 years ago

Submitted pull-request #1671 to fix "Add "Include" directive to config files.". Re-used the rcfile option under the MASTER section. That option seemed to be unused anyway...

wookayin commented 6 years ago

This is really necessary. Any workarounds? #1671 is not merged...

gwax commented 6 years ago

One could do something like what EditorConfig does for their configuration files: specify a root = true flag and otherwise have things inherit upwards.

Consider a tree:

project/
|   sub1/
|   |   b.py
|   sub2/
|   |   sub3/
|   |   |   .pylintrc
|   |   |   d.py
|   |   .pylintrc
|   |   c.py
|   .pylintrc
|   a.py

# project/.pylintrc
root = true
[section]
rule1 = 1
rule2 = 2

# project/sub2/.pylintrc
[section]
rule2 = 3
rule3 = 4

# project/sub2/sub3/.pylintrc
root = true
[section]
rule2 = 5

In this case:

project/.pylintrc applies to project/a.py, project/sub1/b.py
project/.pylintrc + project/sub2/.pylintrc applies to project/sub2/c.py
project/sub2/sub3/.pylintrc applies to project/sub2/sub3/d.py

In other words, the rules are:

project/a.py, project/sub1/b.py

# effective
[section]
rule1 = 1
rule2 = 2

project/sub2/c.py

# effective
[section]
rule1 = 1
rule2 = 3
rule3 = 4

`project/sub2/sub3/d.py
```
# effective
[section]
rule2 = 5
```

PCManticore commented 6 years ago

@gwax This could be a solution, yes, but we already decided on using an include approach, which for me at least sounds more straightforward and easier to understand. Now the only thing left is for @AWhetter to have enough time to contribute this change. There should be plenty of areas to help if you need this landed sooner rather than later.

AWhetter commented 6 years ago

If anyone does want to help out with this, there is now a project set up that shows what tasks need to be done (https://github.com/PyCQA/pylint/projects/1). I've made issues for everything that should be easy to get started with. They are tagged with the help wanted and per_dir_config labels. The work for this is being done on the per_dir_config branch.

cybertk commented 4 years ago

Any update?

earonesty commented 4 years ago

@AWhetter The project organization is amazing. This is an important feature for large projects.

jecaro commented 4 years ago

I'd be very interested in this feature as well. I find the documentation confusing.

It let the user think that .pylintrc are working that way. The documentation should be fixed at least.

dineshtrivedi commented 2 years ago

Hello guys, I just want to check if anyone has found an alternative way to do this at the moment?

Thank you

Pierre-Sassoulas commented 2 years ago

This issue is tracked in project as 'per directory configuration', there's a bunch of issues that are step towards this goal and that everyone can contribute too.

dineshtrivedi commented 2 years ago

Thank you for the response @Pierre-Sassoulas I will check it out and see if I am able to contribute

DanielNoord commented 2 years ago

I'm working on a new approach in: https://github.com/PyCQA/pylint/pull/6789

Instead of doing all the refactoring that the discussion here has led to, I think there is a much simpler approach possible which I have outlined in that PR.

Create a basic Namespace
Iterate over all files_or_modules and see if we can create additional Namespaces for subdirectories
Look up the correct Namespace before checking a file

With #6789 we already check which namespace we need before checking a file.

We only need a good system to iterate over all directories and create Namespaces for them if we encounter a new configuration file. That should be relatively easy now that the parsing of configuration is in one straightforward method. I'd welcome any PRs that tries to implement step 2 😄

simon-liebehenschel commented 2 years ago

I did not find if this already discussed, but if you will take a look at how Mypy allows to configure the same, you will see an easy (for the end user) solution:

[mypy-my_directory.*,my_another_module.*,something_else.*]
rule_name_that_I_want_to_set_for_the_mentioned_dirs = false
another_rule_only_per_these_dirs = true

No multiple configs. No config overlapping, duplicate rules, etc.
Easy settings management.
All in one file.

Can not be it done for pylint in the same way?

Demo pylint config:

# Global rules
disable = something, something_else

# Specify rules per specific directories
[pylint-tests,deploy.tests]
disable = 
    # Do not complain about Pytest fixtures
    redefined-outer-name,

christopherpickering commented 1 year ago

@AIGeneratedUsername I did it through per-file-ignores in a custom plugin. https://github.com/PyCQA/pylint/issues/3767#issuecomment-1319916278

nickurak commented 5 months ago

A complication I just thought of: if the configuration files can load plugins, what does it mean to have a plugin enabled for one subdirectory and not another? If plugins can't be unloaded at runtime (or perhaps disabled temporarily) it would be hard for that concept to work with different plugin configurations in different contexts with a single invocation of pylint.

Edited to add: That could be valuable though. I've been using the pylint-pytest plugin for example, and it would be nice to be super-confident that it's not going to have any impact outside of my test subdirectory.

DanielNoord commented 5 months ago

Yes, this is definitely a concern that we should take into consideration.

imomaliev commented 5 months ago

@nickurak I think at least for the initial version of this feature, we could ignore per folder plugins functionality.

DanielNoord commented 5 months ago

I think @nickurak makes a good point. We can't really ignore this as subdirectories could trigger the loading of plugins which would affect the rest of the run. We would need to explicitly disable plugins if we want to avoid this.

imomaliev commented 5 months ago

We would need to explicitly disable plugins if we want to avoid this.

Yeah, this is what I meant. Also, I am not sure how dangerous executing per folder plugins could be, I feel like this could be a possible attack vector.

nkalpakis21 commented 1 month ago

i see this has been open for awhile. is there trouble getting a dev to fix this issue? what is blocking this for so long

pylint-dev / pylint

Add different configuration for different sub directories #618