madpah / requirements-parser

A Pip requirements file parser.
https://requirements-parser.readthedocs.io
Apache License 2.0
128 stars 42 forks source link

Dataclass Support + Hashing Support #69

Open kabirkhan opened 2 years ago

kabirkhan commented 2 years ago

Hi I'm using this project to parse requirements and have a couple features I'd love to build into the package itself that I think other users would benefit from. Happy to submit a PR if both sound good. Thanks!

1. Dataclass Support

This would be a super small change of adding type annotations for all members. Basically adding the type definitions in https://github.com/madpah/requirements-parser/pull/65 as annotations instead of inside __init__ and adding the dataclass decorator.

from dataclasses import dataclass

@dataclass
class Requirement(BaseModel):
    line: str
    editable: bool = False
    local_file: bool = False
    specifier: bool = False
    vcs: ty.Optional[str] = None
    name: ty.Optional[str] = None
    subdirectory: ty.Optional[str] = None
    uri: ty.Optional[str] = None
    path: ty.Optional[str] = None
    revision: ty.Optional[str] = None
    hash_name: ty.Optional[str] = None
    hash: ty.Optional[str] = None
    extras: ty.Optional[ty.List[str]] = None
    specs: ty.Optional[ty.List[ty.Tuple[str, str]]] = None

    def __init__(self, line: str) -> None:
        ...

Benefits

Cons

My use case is building remote virtual environments that users can execute code in. Building venvs can be a little expensive so I'd like to add caching support by hashing the Environment name + set of requirements.

Currently, the Requirement type is not hashable or sortable. This can be achieved by just adding the __hash__ method and returning the hash of the line member + adding the built in methods to sort by the Requirement name or line member.

Proposed methods

@dataclass
class Requirement:
    ...
    def __hash__(self) -> int:
        return hash(self.line)

    def __getitem__(self, key: str) -> str:
        return getattr(self, key)

    def __lt__(self, other: "Requirement") -> bool:
        return (self.name) < (other.name)

    def __gt__(self, other: "Requirement") -> bool:
        return (self.name) > (other.name)

    def __le__(self, other: "Requirement") -> bool:
        return (self.name) <= (other.name)

    def __ge__(self, other: "Requirement") -> bool:
        return (self.name) >= (other.name)

With these methods, I can run a good cache check to see if a list of requirements actually changed.

Usage

requirements.txt

httpx
requests

other_requirements.txt

requests
httpx

test.py The above requirements files are technically the same even though the file contents are seen as different. Usage

from requirements import parse
with open("./requirements.txt") as f:
    requirements = list(parse(f.read()))

with open("./other_requirements.txt") as f:
    other_requirements = list(parse(f.read()))

sorted_requirements = sorted(set(requirements)) # set and sorted currently won't work
sorted_other_requirements = sorted(set(other_requirements))

assert sorted_requirements == sorted_other_requirements

from