package `__path__` is an empty list

lesteve commented 5 months ago

I noticed this when working on moving scikit-learn to Meson. I tried with numpy and get the same result so I am guessing this is not something due to our particular setup?

With editable install (pip install -e . --no-build-isolation)

❯ python -c 'import sklearn; print(sklearn.__path__)'
[]

When I use normal install (pip install . --no-build-isolation), I get something like this:

❯ python -c 'import sklearn; print(sklearn.__path__)'
['/home/lesteve/micromamba/envs/skimage-dev/lib/python3.12/site-packages/sklearn']

This is not a huge issue since we were able to use __file__ for our needs, but I thought I would open an issue anyway.

dnicolodi commented 5 months ago

It is my understanding that the __path__ and __file__ attributes of module objects are implementation details of the import mechanism and should not be relied upon to access package resources. In particular, the __path__ attribute may be an empty list. The only supported way to access package resources is via the importlib.resources module.

Editable installs as implemented by meson-python use a customized module loader to load the module code from the source directory and the build directory, as needed. A consequence of this, is that we need to make the __path__ attribute of package modules empty, to have the custom loader be called for all imports.

rgommers commented 5 months ago

Agreed. For the record, I haven't encountered any issues when switching to importlib.resources in NumPy/SciPy as far as I remember (aside from the API removals in importlib for Python 3.12 that require if/else conditional code).

lesteve commented 5 months ago

Thanks for your answer, it seems like a side-effect of this is that pkgutil.walk_packages does not find subpackages.

This is a mild annoyance since we are using pkgutil.walk_packages to collect some of our tests. We call this kind of tests "common tests", because they check some invariant for all the scikit-learn models (e.g. that if you fit a model on 1d array, you will get an error of a certain kind). See https://github.com/scikit-learn/scikit-learn/pull/28040#issuecomment-1889562642 for more details.

dnicolodi commented 5 months ago

Please refer to the documentation of pkgutil.walk_packages(). In particular:

Note: Only works for a finder which defines an iter_modules() method. This interface is non-standard, so the module also provides implementations for importlib.machinery.FileFinder and zipimport.zipimporter.

The fact that the custom loader exposes an empty __path__ has nothing to do with this other issue. We could implement this method for the import machinery used by editable installs, but as the interface is not standardized there is no guarantee that this will keep working for future Python releases. Thus, if anything, it is very low priority.

Looking at this a tiny bit more in depth, the interface of iter_modules() is not even documented. If we want to implement this we would need to base it on what the other module finders implement, rather than on a specification.

lesteve commented 5 months ago

OK thanks for your pointers, I'll try to look into it a bit more. I have to admit I am not familiar at all with this kind of subtle import machinery.

rgommers commented 5 months ago

I'll note that this usage of pkg_util.walk_packages works: https://github.com/scipy/scipy/blob/8e9d43838b7db8d40fc4567f3aafeed68466b464/scipy/_lib/tests/test_public_api.py#L241. That test (pytest scipy/_lib -k test_all_modules_are_expected) passes for me with an editable install.

It'd be better to avoid __path__ and __file__ completely though, because it's indeed not clear when this will or won't work.

lesteve commented 5 months ago

OK weird, just to make 100% sure, can you confirm that scipy.__path__ is not [] then when using meson editable install?

If that's the case, there may be something that is sklearn specific, I need to have a closer look.

rgommers commented 5 months ago

Eh oh, good catch. The test is silently passing, but it is an empty list and hence the test does not work as intended in editable mode:

scipy.__path__:  []
scipy.__file__:  /Users/rgommers/code/bldscipy/scipy/__init__.py
scipy.__name__:  scipy

lesteve commented 5 months ago

Eh oh, good catch. The test is silently passing, but it is an empty list and hence the test does not work as intended in editable mode

OK so at least this is not scikit-learn specific :relieved:

ogrisel commented 5 months ago

Thanks for the feedback @dnicolodi. After a bit of trial and error, here is the only way I could get to write an alternative to pkgutil.walk_packages that never relies on the __path__ attribute of inspected packages.

from pathlib import Path
from importlib.util import find_spec
from pkgutil import iter_modules

# Alternative to sklearn.__path__ that does not rely on sklearn.__file__:

def get_package_path(package):
    spec = find_spec(package)
    if spec is None:
        raise ImportError(f"No module named {package!r}")
    return [Path(spec.origin).parent]

print(f"{get_package_path('sklearn') = }")

# Alternative to pkgutil.walk_packages that does not rely on pkg.__path__:

def walk_packages(path=None, prefix='', onerror=None):
    """Yields ModuleInfo for all modules recursively
    on path, or, if path is None, all accessible modules.

    'path' should be either None or a list of paths to look for
    modules in.

    'prefix' is a string to output on the front of every module name
    on output.

    Note that this function must import all *packages* (NOT all
    modules!) on the given path, in order to access the __path__
    attribute to find submodules.

    'onerror' is a function which gets called with one argument (the
    name of the package which was being imported) if any exception
    occurs while trying to import a package.  If no onerror function is
    supplied, ImportErrors are caught and ignored, while all other
    exceptions are propagated, terminating the search.

    Examples:

    # list all modules python can access
    walk_packages()

    # list all submodules of ctypes
    walk_packages(ctypes.__path__, ctypes.__name__+'.')
    """

    def seen(p, m={}):
        if p in m:
            return True
        m[p] = True

    for info in iter_modules(path, prefix):
        yield info

        if info.ispkg:
            try:
                __import__(info.name)
            except ImportError:
                if onerror is not None:
                    onerror(info.name)
            except Exception:
                if onerror is not None:
                    onerror(info.name)
                else:
                    raise
            else:
                path = get_package_path(info.name)

                # don't traverse path items we've seen before
                path = [p for p in path if not seen(p)]

                yield from walk_packages(path, info.name+'.', onerror)

for info in walk_packages(get_package_path('sklearn'), 'sklearn.'):
    print(info.name)

The walk_packages function is just a copy of the original pkgutil.walk_packages in the standard library (https://github.com/python/cpython/blob/d457345bbc6414db0443819290b04a9a4333313d/Lib/pkgutil.py#L39). The only line that I changed is the definition of the local path variable that relies on my get_package_path helper to use the loader retrieved by find_spec instead of using the value found in __path__.

It seems to fix the problem for scikit-learn when installed in meson-python editable mode, but it's a shame not to be able to use the standard pkg.walk_packages when using the editable mode.

Shall this be considered a bug in pkgutil.walk_packages? I am not familiar enough with the package import machinery to tell whether or not its use of the __path__ attribute is unsafe.

ogrisel commented 5 months ago

By reading more about import machinery in CPython and the source code of mesonpy, I think this is just a missing feature in mesonpy's custom module finder for editable install and the fix is quite straightforward, see #562.

dnicolodi commented 5 months ago

The implementation of get_package_path() above is a complex way to get to os.path.dirname(__file__). The walk_packages() implementation is still based on pkgutil.iter_modules() that does not work for module finders that implement loading of modules from anything that is not a filesystem structure that mirrors the logical organization of the modules. The editable install module finder does just that. I haven't had time yet to investigate closely if there is a way to make pkgutil.iter_modules() work for custom module finders. If there is, it is not documented and it is not part of the interfaces that a standard module finder is supposed to be implemented. This is hinted from the documentation I quoted in a previous comment.

mesonbuild / meson-python

package `path` is an empty list #557

mesonbuild / meson-python

package `__path__` is an empty list #557

package `path` is an empty list #557