python / cpython

The Python programming language
https://www.python.org
Other
62.25k stars 29.91k forks source link

`hashlib.file_digest()` can't handle non-blocking I/O #122179

Open srittau opened 1 month ago

srittau commented 1 month ago

Bug report

Bug description:

This came up in python/typeshed#12414.

The current implementation of file_digest() does not check the return value of fileobj.readinto() for None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/hashlib.py#L232-L236

While buffered file objects can't return None, unbuffered ones can when they are doing non-blocking I/O. Specifically, file_digest() is documented to take SocketIO objects, which can very much return None:

https://github.com/python/cpython/blob/2a5d1eb7073179a13159bce937afdbe240432e7d/Lib/socket.py#L694-L714

CPython versions tested on:

CPython main branch

Operating systems tested on:

Other

Linked PRs

srittau commented 1 month ago

The solution could be as easy as adding if size is None: continue, possibly adding a short (.1s?) delay, but I'm not terribly familiar with non-blocking I/O.

graingert commented 1 month ago

I suspect you want if size is None: raise BlockingIOError

graingert commented 1 month ago

possibly adding a short (.1s?) delay

This wouldn't help, as when using non blocking IO it's likely that no new data will arrive until another task running on this thread gets executed

gpshead commented 1 month ago

FWIW, I do not think it is reasonable to expect that APIs taking a file object as input in order to read or write data from it to be expected to handle non-blocking IO unless they explicitly document themselves as doing so. It isn't the default expectation for what a "file object" is.

hashlib.file_digest() refers to "SocketIO objects from socket.makefile()" which documents itself as "The socket must be in blocking mode". So the problem statement in this issue isn't quite accurate on that detail.

We should add the word "blocking" to the file_digest documentation to make the implied constraint more clear.

The proposed PR as is would make it explicitly raise an error when a file read did return None internally implying it was in non-blocking mode. It's at least a nicer error to read than trying to figure out why something raised TypeError due to None being used when bytes was expected.

srittau commented 1 month ago

hashlib.file_digest() refers to "SocketIO objects from socket.makefile()" which documents itself as "The socket must be in blocking mode". So the problem statement in this issue isn't quite accurate on that detail.

We should add the word "blocking" to the file_digest documentation to make the implied constraint more clear.

Good catch. I'll amend the docs tomorrow, unless someone beats me to it. (I think all Python maintainers can commit to this PR, and should feel free to do so.)

graingert commented 1 month ago

It's not clear to me why socket.makefile() has this defensive coding around non blocking sockets when they're documented as not supported. Checking blame it was added due to https://github.com/python/cpython/issues/54063