Closed Toprak2 closed 5 hours ago
Thank you for the well explained report!
As you suggest, we could say that np.int32
inputs are not supported and the caller should make sure they're using an int
, float
or str
, as documented, but the suggested fixes appear quite straightforward. I'll reserve the right to revert if it causes problems in the future :)
Let's go for option 1, because option 2 fails this test:
def test_naturalsize(test_args: list[int] | list[int | bool], expected: str) -> None:
> assert humanize.naturalsize(*test_args) == expected
E AssertionError: assert '1000.0 ZB' == '1.0 YB'
E
E - 1.0 YB
E + 1000.0 ZB
Would you like to open a PR?
Thank you for reviewing the issue quickly.
I’ll proceed with implementing option 1 and will create a PR soon. If there are any additional tests you’d like me to add, please let me know.
Yeah, I was considering if we should add a test using NumPy, but np.int32
isn't really a supported input, and installing NumPy can be tricky during the Python pre-release phase when there's no wheel, so maybe not.
What did you do?
I was using the torchio library, which relies on humanize to return the memory size of image arrays. The images I processed had dimensions 512x512x166, with each pixel being a 32-bit (4-byte) integer.
What did you expect to happen?
torchio calls the naturalsize function with the occupied memory size in bytes and sets binary=True. Manually calculating the expected value: 512×512×166×4÷(1024×1024)=166 MiB
So, I expected the function to return approximately 166 MiB.
What actually happened?
Instead, the returned value was -2 MiB, accompanied by this warning:
This overflow occurs because the input to the function was of type np.int32 instead of Python's native int. Since np.int32 has a maximum value of 2^31−1, the multiplication of base and bytes_ results in an overflow.
Steps to Reproduce
Proposed Solutions
1) Change the Order of Operations Adjusting the order of operations can avoid overflow. In the current line:
when bytes is np.int32, multiplying base and bytes produces an np.int32 result, which overflows before it’s divided by unit. By dividing either base or bytes_ by unit before the multiplication, each sub-operation remains a float:
2) Convert Input to Float Alternatively, cast value to float without checking its type. Currently, the casting applies only if value is a string:
Updating it to cast all inputs to float could resolve the issue:
I haven’t created a pull request since I’m unsure if you’d prefer developers to ensure input compatibility or handle this within the function. Environment