python / cpython

The Python programming language
https://www.python.org
Other
63.59k stars 30.46k forks source link

Pickle deserialization null byte discrepancy #126996

Open Legoclones opened 3 days ago

Legoclones commented 3 days ago

Bug report

Bug description:

In C, the null byte indicates the end of a char[]. In INT and LONG opcodes for pickle, everything up to a newline is read from the bytestream and ran through a string to integer conversion function. However, a bytestream like b'L1\x00anything\n.' or b'I1\x00anything\n.' does not fail in _pickle.c (like it does in pickle.py and pickletools.py) due to the null byte.

On line 5208 (for INT) and line 5362 (for LONG), _Unpickler_Readline(state, self, &s) reads everything (including a null byte) into the s variable, which is char *. However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered, meaning everything including and after the null byte is ignored, returning 1 (in the above example).

It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

vstinner commented 3 days ago

However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered

How do you produce pickle files which contain null bytes? Using pickle.dumps()?

Legoclones commented 2 days ago

pickle.dumps() or other built in functions wouldn't produce this because they use repr() on a number, so this would have to be a custom pickle made by hand.

vstinner commented 2 days ago

It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.

I don't think that it's the desired behavior and I don't see any easy fix. Unless someone has a fix, I suggest to close the issue, since it's more a theoretical issue.

serhiy-storchaka commented 2 days ago

It is known issue that PyLong_FromString() truncates input at embedded null byte. We can use private function _PyLong_FromBytes() or explicitly check strlen(). This is relatively easy issue.

vstinner commented 1 day ago

Does someone want to propose a PR using strlen()?