python / typeshed

Collection of library stubs for Python, with static types
Other
4.29k stars 1.72k forks source link

`file_digest.file_digest` signature is overly strict #12414

Open ncoghlan opened 1 month ago

ncoghlan commented 1 month ago

Using hashlib.file_digest with a file object obtained via pathlib.Path.open failed to typecheck with a protocol compatibility error:

src/create_bundles.py:223: error: Argument 1 to "file_digest" has incompatible type "FileIO"; expected "_BytesIOLike | _FileDigestFileObj"  [arg-type]
src/create_bundles.py:223: note: Following member(s) of "FileIO" have conflicts:
src/create_bundles.py:223: note:     Expected:
src/create_bundles.py:223: note:         def readinto(self, bytearray, /) -> int
src/create_bundles.py:223: note:     Got:
src/create_bundles.py:223: note:         def readinto(self, Buffer, /) -> int | None

I'm reasonably sure this is a typeshed issue rather than a mypy issue.

My first thought was that this looks like an internal inconsistency in typeshed, as _FileDigestFileObj is declared specifically with bytearray rather than the more general WriteableBuffer (hence filing the issue here).

However, my second thought was that this signature declaration is presumably as it is because file_digest will specifically pass a bytearray instance to readinto, and bytearray is compatible with Buffer, so mypy shouldn't be complaining about that. That means the conflict is presumably on the return type rather than on parameter types.

The declared type signature nominally indicates that file_digest can't cope with readinto returning None, and mypy is correctly flagging that as inconsistent with the way typeshed declares the file and socket IO types (allowing them to return None here).

Searching the typeshed code suggests this particular inconsistency exists across several different IO consumer type declarations: https://github.com/search?q=repo%3Apython%2Ftypeshed%20readinto&type=code

srittau commented 1 month ago

I/O types are a mess, which is why we're moving more into tight protocols for arguments as used in hashlib.

I think your analysis is correct. In fact, file_digest can't handle None return values:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/hashlib.py#L232-L236

On the other hand it seems that FileIO.readinto can't return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L657-L684

At the moment, FileIO inherits readinto from RawIOBase. I think we just need to override it in FileIO with a more precise return type.

AlexWaygood commented 1 month ago

On the other hand it seems that FileIO.readinto can't return None:

Doesn't it return None in this code path here? https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Modules/_io/fileio.c#L678

srittau commented 1 month ago

I blame my the heat and my tiredness for overlooking this quite obvious code path ... That said, I'd be interested in which situation FileIO works with non-blocking I/O. One case I can see is here:

https://github.com/python/cpython/blob/2c1b1e7a07eba0138b9858c6f2bea3cae9af0808/Python/fileutils.c#L1887

Another is probably if an opened file descriptor has the non-blocking flag set.

I see two possible solutions: Make FileIO.readinto() return int | MaybeNone as the situations where this happens are a bit esoteric. That said, I actually think that this uncovered a real bug in file_digest() as that doesn't handle the None case correctly when it should.

srittau commented 1 month ago

In fact, file_digest is explicitly documented as taking a SocketIO object. And that can very much return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/socket.py#L713