python / cpython

The Python programming language
https://www.python.org
Other
63.61k stars 30.47k forks source link

`importlib.abc.Traversable.read_text()` incompatible with `importlib.resources._functional.read_text()` usage (Python 3.13) #127012

Open kurtmckee opened 4 days ago

kurtmckee commented 4 days ago

Bug report

Bug description:

I'm writing a custom importer and discovered that the function signature for importlib.abc.Traversable.read_text() is incompatible with the usage in importlib.resources._functional.read_text(), specifically on Python 3.13.

  1. importlib.abc.Traversable.read_text() is a concrete method; its implementation calls the .open() method, which is an abstract method that must be implemented. The expectation is therefore that implementing .open() in a Traversable subclass is sufficient for .read_text() to work.

    Note below that the .read_text() method is not marked as abstract, and includes only one parameter: encoding:

    https://github.com/python/cpython/blob/30aeb00d367d0cc9e5a7603371636cddea09f1c0/Lib/importlib/resources/abc.py#L84-L90

  2. Application code that attempts to read a package resource, like importlib.resources.read_text(module, "resource.txt") ultimately leads to a call to importlib.resources._functional.read_text(), which attempts to call the .read_text() method of a Traversable subclass, but includes an errors parameter that doesn't exist in Traversable's default concrete method:

    https://github.com/python/cpython/blob/30aeb00d367d0cc9e5a7603371636cddea09f1c0/Lib/importlib/resources/_functional.py#L28-L32

  3. Consequently, it appears to be necessary for all Traversable subclasses to not only re-implement its concrete .read_text() method, but also to override its signature.

I think that the Traversable .read_text() method signature, and the call site in importlib.resources._functional.read_text(), need to align with each other.

I'd like to submit a PR for this! However, I would like confirmation that an errors parameter should be added to the Traversable.read_text() method.

Note that adding an errors parameter was previously discussed in #88368.

Demonstration of TypeError bug

import io
import sys
import typing
import pathlib
import types
import importlib.abc
import importlib.machinery
import importlib.metadata
import importlib.resources.abc

class ExampleFinder(importlib.abc.MetaPathFinder):
    def find_spec(
        self,
        fullname: str,
        path: typing.Sequence[str] | None,
        target: types.ModuleType | None = None,
    ) -> importlib.machinery.ModuleSpec | None:
        if fullname != "demonstrate_error":
            return None

        print(f"ExampleFinder.find_spec('{fullname}')")
        spec = importlib.machinery.ModuleSpec(
            name=fullname,
            loader=ExampleLoader(),
            is_package=True,
        )
        return spec

sys.meta_path.append(ExampleFinder())

class ExampleLoader(importlib.abc.Loader):
    def exec_module(self, module: types.ModuleType) -> None:
        print(f"ExampleLoader.exec_module({module})")
        exec("", module.__dict__)

    def get_resource_reader(self, fullname: str) -> "ExampleTraversableResources":
        print(f"ExampleLoader.get_resource_reader('{fullname}')")
        return ExampleTraversableResources(fullname)

class ExampleTraversableResources(importlib.resources.abc.TraversableResources):
    def __init__(self, fullname: str) -> None:
        self.fullname = fullname

    def files(self) -> "ExampleTraversable":
        print("ExampleTraversableResources.files()")
        return ExampleTraversable(self.fullname)

# ----------------------------------------------------------------------------
# ExampleTraversable implements all five of the Traversable abstract methods.
# Specifically, it is expected that implementing `.open()` will be sufficient,
# but this will not be the case.
#

class ExampleTraversable(importlib.resources.abc.Traversable):
    def __init__(self, path: str):
        self._path = path

    def iterdir(self) -> typing.Iterator["ExampleTraversable"]:
        yield ExampleTraversable("resource.txt")

    def is_dir(self) -> bool:
        return False

    def is_file(self) -> bool:
        return True

    def open(self, mode='r', *args, **kwargs) -> typing.IO[typing.AnyStr]:
        return io.StringIO("Nice! The call to .read_text() succeeded!")

    # Uncomment this `.read_text()` method to make `.read_text()` calls work.
    # It overrides the `Traversable.read_text()` signature.
    #
    # def read_text(self, encoding: str | None, errors: str | None) -> str:
    #     print(f"ExampleTraversable.read_text('{encoding}', '{errors}')")
    #     return str(super().read_text(encoding))

    @property
    def name(self) -> str:
        return pathlib.PurePosixPath(self._path).name

# -------------------------------------------------------------------------------
# Everything above allows us to import this hard-coded module
# and demonstrate a TypeError lurking in the Traversable.read_text() signature.
#

import demonstrate_error

# The next line will raise a TypeError.
# `importlib/resources/_functional.py:read_text()` calls `Traversable.read_text()`
# with an `errors` argument that is not supported by the default concrete method.
print(importlib.resources.read_text(demonstrate_error, "resource.txt"))

CPython versions tested on:

3.13

Operating systems tested on:

Linux

kanishka-coder0809 commented 3 days ago

hey @kurtmckee I can work on this issue ...can you assign me.

kurtmckee commented 2 days ago

@kanishka-coder0809 As I wrote in the text above, I'd like to fix this myself but I'm waiting for confirmation how to proceed, because I think this issue implicates importlib_resources and the standard library.

In any case, I don't have GitHub permissions to assign this to anyone.

williamwen42 commented 15 hours ago

I'm encountering this issue while trying to add 3.13 support to PyTorch.

cc @brettcannon @jaraco @warsaw @FFY00

jaraco commented 40 minutes ago

Thanks Kurt for the detailed report elucidating the issue. I do think I agree that adding the required parameter to the default read_text implementation sounds right.

When reasoning about where to create the fix, I'm fine if it goes here or in importlib_resources. I personally find it easier and faster to develop on importlib_resources first, in part because it will apply to older Python versions (3.9+) and also because the test suite is easier to run (just run tox). But if you choose to develop it here on CPython first, that's fine and I'll backport it to importlib_resources (and thus also get immediate feedback when it's released in short order).

At first, I was thinking that this change would be backward-incompatible, because it's a change to a protocol (on which downstream consumers rely), but since it's a change to a concrete method on the protocol, I'm thinking a backward-compatible change is possible, but we'll want to think carefully about that aspect of the change.

Please feel free to proceed with a proposed fix.