python / cpython

The Python programming language
https://www.python.org
Other
62.97k stars 30.15k forks source link

`datetime.strftime` strings can be terminated by "\x00" literals #124531

Open pganssle opened 2 weeks ago

pganssle commented 2 weeks ago

Bug report

Bug description:

Apparently the strftime parser treats \x00 as "end of string" in the format code, and the remainder of the string is ignored:

>>> from datetime import datetime
>>> datetime(2024, 9, 25).strftime("\x00%Y-%m-%d")
""

I would have expected:

>>> from datetime import datetime
>>> datetime(2024, 9, 25).strftime("\x00%Y-%m-%d")
"\x002024-09-25"

Discovered this when adding some hypothesis tests for strptime/strftime. I suspect again that if you include a null character in your datetime format string you should expect something to act weird as hell about it, but we should probably fix this anyway if it's not too costly.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

JelleZijlstra commented 2 weeks ago

I guess this makes sense if we create a C string and pass this to the C strftime function. Perhaps we should raise an error in our code if we encounter a null character in the format string.

pganssle commented 2 weeks ago

I guess this makes sense if we create a C string and pass this to the C strftime function. Perhaps we should raise an error in our code if we encounter a null character in the format string.

I think we can just escape them and then un-escape them before returning, no?

We handle this just fine in datetime.fromisoformat:

>>> datetime.now().isoformat(sep="\x00")
'2024-09-25\x0013:22:39.621972'
>>> datetime.fromisoformat(datetime.now().isoformat(sep="\x00"))
datetime.datetime(2024, 9, 25, 13, 22, 52, 309562)
Mariatta commented 2 weeks ago

I also think it shouldn't raise error in this case. My expectation is the strftime shouldn't care about anything that isn't datetime format specifier, so it should ignore the \x00 instead of treating it as end of string.

serhiy-storchaka commented 1 week ago

There are many other bugs in strftime(), and I think that fixing this issue gives a key to fix #78662 and #52551.

serhiy-storchaka commented 6 days ago

This approach did not work on platforms without wstrftime() because PyUnicode_EncodeLocale() does not support embedded null characters. So I was forced to write more complex and generic patch that fixes also other issues. All that remains is to write the tests, I will do it tomorrow.