python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

`ast.literal_eval` Segmentation Fault in Python 3.9/3.10 #126711

Open EgodPrime opened 3 days ago

EgodPrime commented 3 days ago

Crash report

What happened?

l  = 200000
a = 'P/a'*l
import ast
ast.literal_eval(a)

The above code makes Python 3.9/3.10 crash with Segmentation Fault.

This error input can be correctly detected in Python 3.11+ with RecursionError: maximum recursion depth exceeded during ast construction, but it still results into crash in 3.9.20 and 3.10.15.

A smaller l such as 200 will get the correct error Value Error: malformed node or string for all Python 3.9+

CPython versions tested on:

3.9, 3.10

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.9.20 (main, Oct 3 2024, 07:27:41) [GCC 11.2.0], Python 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]

skirpichev commented 3 days ago

@EgodPrime, please avoid creating duplicate issues.

FWIW, the 3.9 docs says: "It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler."

3.10+ docs are more vague: "It is possible to crash the Python interpreter due to stack depth limitations in Python’s AST compiler."

So, I'm not sure if we should count this as a bug.

EgodPrime commented 3 days ago

@skirpichev, thanks for your reply. But I am wondering whether it means that Python 3.9/3.10, which is still in use in production, is not safe for such inputs.

skirpichev commented 3 days ago

Documentation (since 3.10) explicitly says, that ast.litaral_eval() is not safe for arbitrary input: "This function had been documented as “safe” in the past without defining what that meant. That was misleading. This is specifically designed not to execute Python code, unlike the more general eval(). There is no namespace, no name lookups, or ability to call out. But it is not free from attack: A relatively small input can lead to memory exhaustion or to C stack exhaustion, crashing the process. There is also the possibility for excessive CPU consumption denial of service on some inputs. Calling it on untrusted data is thus not recommended."

EgodPrime commented 3 days ago

I see it. And I now think this is not a bug but just a poc to trigger the crash (C stack exhaustion) of the officially declared unsafe ast.literal_eval.

Eclips4 commented 3 days ago

Hello! Thank you for the report. Since ast.literal_eval has been documented as a function that can potentially lead to interpreter crashes, I don't think it's worthwhile to fix it in 3.9. However, it seems reasonable to me to backport #95919 to 3.9 version. cc @gpshead

EDIT: I saw a Christian comment: https://github.com/python/cpython/issues/95588#issuecomment-1203732744. Though, I can't say that I agree with it. This function was probably never safe and the documentation was just wrong.

skirpichev commented 3 days ago

I'll prepare a patch.

skirpichev commented 3 days ago

Well, 3.9 is for security-fixes only. But here is a docs backport: https://github.com/python/cpython/pull/126729

Eclips4 commented 3 days ago

I would like to consider this issue as a security problem because crash of interpreter can lead to DOS.

ZeroIntensity commented 3 days ago

My main concern here is that we can't undo documenting ast.literal_eval as safe. Plenty of people are using it in seemingly unsafe places as a result, and I'm not sure it's a good idea to just stick a big bandaid on it for 3.9. If there's absolutely nothing we can do to fix it, then perhaps we could emit a warning when using it on 3.9?

gpshead commented 2 days ago

https://github.com/python/cpython/issues/126711 already backports the doc change to 3.9. So yes we can. We can never prevent people from writing code that doesn't behave as they desire, all we can do is make expectations more clear.

We cannot meaningfully add a warning no matter what branch because there are plenty of valid uses of ast.literal_eval and it is impossible to tell which ones are and are not.