Open gvozdila opened 4 months ago
I'm not sure whether it's worth fixing. The pickle.load()
function accepts something that is the semantic output of pickle.dump()
, but ]replace.
does not look like the output of pickle.dump()
.
Agreed: if pickle.dump()
can product that, we should fix it. But unpickling random data will never be safe.
Sometimes CPU and RAM consumption can be CVE. Not for pickle, but for other formats.
Example: plistlib.py- DoS attack via CPU and RAM by malformed Apple Property List files in binary format. CVE-2022-48564
For this case most problem is the kill by OOM-killer.
I found something similar for marshal.load(). But marshal throws
ValueError: bad marshal data (unknown type code)
.
marshal example:
import marshal
f=open("./oom-6c95714cba63108c15c13acac86e844aec88301b","rb")
marshal.load(f)
Sometimes CPU and RAM consumption can be CVE. Not for pickle, but for other formats.
Example: plistlib.py- DoS attack via CPU and RAM by malformed Apple Property List files in binary format. CVE-2022-48564
For this case most problem is the kill by OOM-killer.
I found something similar for marshal.load(). But marshal throws
ValueError: bad marshal data (unknown type code)
.marshal example:
import marshal f=open("./oom-6c95714cba63108c15c13acac86e844aec88301b","rb") marshal.load(f)
The same applies for the marshal
:
Documentation states:
Warning The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source.
This appears to be a problem with the C implementation of pickle, the python implementation does raise an error for all examples (experimentally checked by creating a dummy _pickle.py
in the CWD before invoking python -m pickle file.txt
. This could indicate missing or incorrect handling of partial input in the C implementation (I haven't looked at that implementation though).
Regardless of this loading arbitrary pickle files is unsafe, as mentioned in the documentation.
I think that it is worth fixing. There are many tests for errors detection in broken pickle data (see for example bpo-23914/#68102, bpo-25761/#69947). While by default unpickling can execute arbitrary code and therefore is not safe, with overridden find_class
it can be made more safe. But there are issues not related to execution arbitrary code, and some of them (not this one) are difficult to fix.
Loading a small data which does not even involve arbitrary code execution could consume arbitrary large amount of memory. There were three issues here:
The solution of the first issue can potentially break the user code that uses third-party pickle generators. In normal cases this is not an issue. The stdlib implementations always use sequential numbers without gaps (even after optimization), and the threshold is high enough, so even if the dumb third-party optimizer does not re-pack indices, the error can realistically occur only for very large data (hundreds of millions of items). But it had to be said that this is potentially a breaking change.
The third issue could be solved more generally on the lower level: in FileIO, BufferedReader and RawIOBase. This would solve similar vulnerabilities in other binary file formats and protocols. But this idea was not supported, so we must test each binary file formats and protocols separately for this vulnerability.
But I thought it was acknowledged that this isn't a vulnerability regardless of any other factors, since the pickle module is explicitly a non-security-safe domain.
It's not possible to get a security vulnerability, even by loading data that doesn't contain arbitrary code execution, since that presupposes one is permitted to decide that the security warning in the pickle module doesn't apply if you do unspecified third-party analysis of a pickle, decide that it's safe, and end up being wrong.
It is a non-security-safe domain because unpickling involves arbitrary code execution. But with overridden find_class()
method you can control how names are resolved, and therefore what is permitted to be executed. It is not easy, but it should be enough to make unpickling of limited data security-safe. If there would not be vulnerabilities in basic opcodes.
Bug report
Bug description:
With some files python trying to pickle.load() use all available RAM and get killed by OOM killer.
How-to-reproduce:
where oom-1a498cedae1dc957f2fbb15d18a5c265dc7b9ae2.txt:
Whats-happend: RAM consumption grows till process killed by OOM killer
Whats-you-expect: some kind of error?
I'll include 3 examples of "broken" pickle files. But i cannot find the way to see similarities in content. And I'm not very well versed in the pickle format to find out the root of the problem.
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
Linux
Linked PRs