python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
17.85k stars 2.74k forks source link

`OverflowError`s and `MemoryError`s in helper functions #17454

Closed sam-xif closed 2 days ago

sam-xif commented 2 days ago

Hello mypy team,

I am working as part of a research team developing a code analysis tool for Python. One of the issues the tool discovered in mypy's codebase is that there is potential for uncaught OverflowErrors and MemoryErrors to be thrown in some helper functions which use the multiplication operator (*) to duplicate strings. The cause of these errors is a very large integer being supplied as a factor in these expressions.

In the code as it is now, the helper functions are not called in such a way that these errors would ever get triggered, but the tool we are building analyzes the code from whatever entry points it can find, hence these errors.

If you are interested in learning more about the tool and how it found this issue, let me know down in the comments, or you can contact me at xifaras.s@northeastern.edu. We are primarily curious about whether you find that this issue is legitimate and worth reporting and fixing (in this case, a fix might look like imposing limits on the integer arguments). If not, we would be interested in understanding why.

Thank you for your consideration!

-Sam

Crash Report

In some helper functions in mypy where the multiplication (*) operator is used to duplicate strings, there is the possibility for OverflowErrors and MemoryErrors to be thrown due to very large integers (> sys.maxsize) being supplied as the multiplication factor.

Traceback

Traceback (most recent call last):
  ...
  File "/data1/groups/neu-type-fuzzing/data/repos/mypy/mypy/util.py", line 527, in soft_wrap
    padding = "\n" + " " * num_indent
OverflowError: cannot fit 'int' into an index-sized integer
Traceback (most recent call last):
  ...
  File "/data1/groups/neu-type-fuzzing/data/repos/mypy/mypy/util.py", line 527, in soft_wrap
    padding = "\n" + " " * num_indent
MemoryError
Traceback (most recent call last):
  ...
  File "/data1/groups/neu-type-fuzzing/data/repos/mypy/mypy/strconv.py", line 639, in indent
    s = " " * n + s
OverflowError: cannot fit 'int' into an index-sized integer
Traceback (most recent call last):
  ...
  File "/data1/groups/neu-type-fuzzing/data/repos/mypy/mypy/strconv.py", line 639, in indent
    s = " " * n + s
MemoryError

To Reproduce

Cannot be reproduced from the mypy CLI, but can easily be reproduced at a local level by calling mypy.util.soft_wrap and mypy.strconv.indent with very large integers (> sys.maxsize) for the respective arguments that are factors in the multiplication expressions.

Your Environment

JelleZijlstra commented 2 days ago

Thanks for finding this, but this is not a realistic problem. We're not going to be indenting by 2**64 spaces.

JelleZijlstra commented 2 days ago

Just to expand on this (in case it's helpful for your research project): Your post reads as if you assume it is a problem if a Python function can raise an exception. That's not how Python is generally written; almost every operation can raise an exception, and robust code handles exceptions that can reasonably be expected to appear (say, an HTTPError when making an HTTP request), but not everything that could possibly happen. Every memory allocation could raise MemoryError and every function call could raise RecursionError; nobody is going to catch those.

In this case, the functions you flagged use string multiplication for creating an indentation string. Obviously, that's only expected to indent by a relatively small number of spaces in normal operation. We could add an explicit check if n > some_big_number: raise ValueError to catch a possible bug where someone passes a huge number, but that wouldn't really make for a more useful error than the OverflowError you'll get now.

sam-xif commented 23 hours ago

@JelleZijlstra this is useful, thank you. Would you mind if I quote your comment in the writeup for the project? I can leave you anonymous if you'd prefer.

JelleZijlstra commented 19 hours ago

Yes, feel free to quote me (with or without attribution as you prefer).

One other thought that occurred to me is that for these functions, the worst behavior might actually occur if they don't raise an exception. If you pass say 1 billion to them, they'll potentially allocate a huge amount of memory, which likely has worse effects on your system than if they immediately raise an exception.