pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.36k stars 2.98k forks source link

Introduce an immutable version of `packaging.requirements.Requirement` #12806

Open ichard26 opened 3 days ago

ichard26 commented 3 days ago

What's the problem this feature will solve?

Unfortunately packaging.Requirements.Requirement is mutable.

>>> from packaging.requirements import Requirement
>>> r = Requirement("black >= 24.1.0")
>>> r.name = "six"
>>> r
<Requirement('six>=24.1.0')>
>>> 

While this is "neat", it is somewhat surprising behaviour and can easily lead to hard to understand/debug logic.[^1] In addition, we are introducing more and more caches across the codebase which contain requirement objects. Currently, these caches are liable to break if a cached requirement object is mutated after the fact (get_requirement("black") could end returning a Requirement("six") object!).

[^1]: I really hope the codebase doesn't depend on mutating requirements to function...

Describe the solution you'd like

To avoid issues with caches and generally pay down our technical debt, introduce an immutable version of packaging.requirements.Requirement.

In general, there are two approaches possible:

The latter has the benefit that is that plain ol' packaging.requirements.Requirement objects are still compatible with code that uses the immutable requirement class internally. Although, this could be considered a downside as then mutable packaging.requirements.Requirement objects could continue to linger in pip's logic.

Essentially, from a type-checking perspective requirements would be always immutable, while code would have to opt into using ImmutableRequirement to be protected at runtime (e.g. the caching get_requirement() utility function).

I included a rough draft of the second approach below.

@runtime_checkable
class Requirement(Protocol):
    """Immutable packaging Requirement interface."""

    @property
    def name(self) -> str: ...
    @property
    def url(self) -> Union[str, None]: ...
    @property
    def extras(self) -> FrozenSet[str]: ...
    @property
    def specifier(self) -> SpecifierSet: ...
    @property
    def marker(self) -> Union[Marker, None]: ...

    def __str__(self) -> str: ...
    def __repr__(self) -> str: ...
    def __hash__(self) -> int: ...
    def __eq__(self, other: Any) -> bool: ...

@dataclass(frozen=True)
class ImmutableRequirement(Requirement):
    """Immutable version of packaging.requirements.Requirement."""

    __slots__ = ("name", "url", "extras", "specifier", "marker", "_parsed_req")

    name: str
    url: Union[str, None]
    extras: FrozenSet[str]
    specifier: SpecifierSet
    marker: Union[Marker, None]

    def __init__(self, requirement: str) -> None:
        self._parsed_req = requirements.Requirement(requirement)
        object.__setattr__(self, "name", self._parsed_req.name)
        object.__setattr__(self, "url", self._parsed_req.url)
        object.__setattr__(self, "extras", frozenset(self._parsed_req.extras))
        object.__setattr__(self, "specifier", self._parsed_req.specifier)
        object.__setattr__(self, "marker", self._parsed_req.marker)

    def __str__(self) -> str:
        return str(self._parsed_req)

    def __repr__(self) -> str:
        return repr(self._parsed_req)

    def __hash__(self) -> int:
        return hash(self._parsed_req)

    def __eq__(self, other: Any) -> bool:
        if isinstance(other, ImmutableRequirement):
            other = other._parsed_req

        return self._parsed_req.__eq__(other)

The amount of work would be likely similar for either approaches, so the deciding factor would be whether packaging.requirements.Requirement objects are created/used outside of pip's direct control or not.

I prefer the first approach as it's simpler and stricter, but I'm not fully aware of how packaging's Requirement object is used across the codebase. Let me know what you think!

Alternative Solutions

Alternatively, a smaller patch would be only enforcing immutability only at runtime. This has the benefit of that if we're fine with lying to mypy and (ab)using __instancecheck__, technically we could just add the immutable class and update get_requirement() to return it, while leaving all other references to packaging.requirements.Requirement alone.

This would be likely a major hack (all while introducing more technical debt) so I don't like this idea, but I thought to propose it just in case.

Additional context

See also https://github.com/pypa/pip/pull/12663 where some early discussion on this problem occured.

Code of Conduct

pradyunsg commented 1 day ago

I would like to make packaging.requirements.Requirement immutable honestly, but this seems like a worthwhile alternative to doing that!