pylint treats `package.module` same as `package/module.py` in terms of search

akamat10 commented 2 months ago

Bug description

As per sys.path initialization docs https://docs.python.org/3/library/sys_path_init.html, the sys.path entry depends on whether python is passed an input script named package/module.py or whether it is invoked through -m package.module option. In the former case, package gets added to the start of sys.path and, in the latter case, the current directory gets added. pylint doesn't have show this distinction. The search behavior is the same whether we call python3.12 -m pylint package.module or python3.12 -m pylint package/module.py although the module name and search path will depend on separate factors. Pylint ends up walking up the directory structure and adds the first directory that doesn't have __init__.py

Say I have the following:

.
├── a.py
└── mypackage
    ├── __init__.py
    └── mymodule.py

contents of a.py:

def myfunc():
    print("Hello World")

contents of mypackage/mymodule.py

import a

a.myfunc()

See pylint output below.

I attempted to fix this in #9910 but have run into regression failures as per the current behavior expected by those regressions. I would love to know thoughts on how we should proceed.


### Configuration

```ini
None

Command used

See above and below

Pylint output

pylinttest akamat10$ python3.12 -m pylint -d C0114 mypackage/mymodule.py

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

pylinttest akamat10$ python3.12 -m pylint -d C0114 mypackage.mymodule

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Expected behavior

pylinttest akamat10$ python3.12  -m pylint -d C0114 mypackage/mymodule.py 
************* Module mypackage.mymodule
mypackage/mymodule.py:1:0: E0401: Unable to import 'a' (import-error)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

Pylint version

pylint 3.2.7
astroid 3.2.4
Python 3.12.3 (v3.12.3:f6650f9ad7, Apr  9 2024, 08:18:47) [Clang 13.0.0 (clang-1300.0.29.30)]

OS / Environment

Darwin 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:19:22 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8112 arm64

Additional dependencies

No response

Pierre-Sassoulas commented 2 months ago

Thank you for your work on this issue (PR included).

An unification of the behavior of sys path discovery would be welcome. I've not been able to deeply immerse myself in the specifics of it so it's hard and frightening to review. The general feeling I have with this is that it's hard to test automatically and that we had a lot of problem last time it was modified. Someone need to be able to understand the problem in depth to fix it, and it look like that could be you 😄

I would start by checking the git blame on the failing tests to understand the context of the PR that introduced them (if they are accessible). If the code predates the migration out of Logilab, from bitbucket then to github, let's assume that it made sense between 2002 to 2014 but it might not make sense anymore.

akamat10 commented 2 months ago

Thank you for the feedback. I will dig in over the next few days. I also going to spend some time doing some research at also something like mypy which also discovers and creates qualified module names.

One thing which has not been clear from pylint documentation is what the qualified module name rules are and how import module discovery works. So it is at least worth documenting to begin with. May be I have missed it on the page. I will document that if it doesn't exist.

akamat10 commented 2 months ago

I think you are right that changing behavior is going to be extremely messy and disruptive and break existing code based on whatever behaviors that were observed. I do see test_functional.py was changed from module input to source file input during refactoring in Sep 10, 2019 here. It is hard to understand the intent of the other files and what assumptions the author had in mind. Also, users may have gotten used to existing behaviors over time.

Here is one proposal. How about we keep existing behavior as is and introduce a new flag like -m and have users pass in a module in form a.b.c? The behavior for qualified module names and search path will be done to match how python does it. There is precedence for such an option in mypy although the handling might slightly different. Similarly, for scripts, we can have users pass in files or directories of files via -s command. This behavior will also be similar to how python handles scripts passed to it. We could push users to use this via documentation and discourage passing files/directories/modules over time. We could also roll this out in phases. Support -m first and then -s later. Thoughts?

Pierre-Sassoulas commented 2 months ago

An option to be able to do that sounds reasonable. But why push user away from file/dir ? It's awfully conveniant because of the operating systeme autocomplete.

akamat10 commented 2 months ago

But why push user away from file/dir ?

Because with the -m option for qualified modules, it feels more more natural(intuitive) to mimic python with -m argument.

So if you call your module with python as:

python -m a.b.c

In pylint, it would be the same as:

pylint -m a.b.c

except that we could say the presence of __main__.py is not necessary for pylint.

With the -s option when used with scripts or directories, the autocomplete of the OS will still be available.

By the way, one more place where the current behavior is confusing is how pylint a\b.py get treated in terms of qualified module name and search path. The answer depends on whether directory a and its parents have the __init__.py. The search path can even end up being the ancestors of the current directory. This is very different from the module name and search path I would get if I run python a\b.py. The search path is a and module name is b in this case.

Another case where it confusing pylint dir1. What should the user expect the search path and qualified module names to be. The answer again is it depends on the locations of the __init__.py file.

DanielNoord commented 2 months ago

I think your changes make sense. Making pylint more in line with what python does is almost always a net plus.

What we probably need is a clear guide (that we could include in the changelog) which clearly describes how somebody could currently invoke pylint and what the new invocation should be. Based on such a guide I think we can also more easily discuss whether it is okay to make such breaking changes or whether we should remain inconsistent with python in favour of usability.

Does that sound like a good step forward to you and Pierre?

akamat10 commented 2 months ago

Thank you for the feedback! I will work on the -m for modules for a single module first and send a PR including doc changes and tests. May be a couple of weeks is what it will take. Once I get feedback, I can iterate on further. Also, as a first step, I will sent a quick PR to update the docs describing the current behavior of module discovery so people are not confused with strange issues with module discovery and qualified module names.

pylint-dev / pylint