python / cpython

The Python programming language
https://www.python.org
Other
63.4k stars 30.36k forks source link

invaild assertion of _io__WindowConsoleIO_write_impl #124008

Open byundojin opened 1 month ago

byundojin commented 1 month ago

Feature or enhancement

Proposal:

If function _io__WindowsConsoleIO_write_impl does not produce as expected output from WriteConsoleW, it is the process of recalculating the existing len.

At this point, function WideCharToMultiByte finds the byte length of utf-8 and function MultiByteToWideChar finds the letter length of utf-16. And is this comparison valid for comparing the results of these two?

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/is-the-size-comparison-of-utf-8-and-utf-16-valid/63614

Linked PRs

serhiy-storchaka commented 1 month ago

This issue is more complex than it looked.

This assertion is not simply redundant, it is wrong. It is not normally triggered because this code is not normally executed. Actually, I do not know how to trigger this situation. But if comment out && n < wlen, the assertion will fail when write non-ASCII data.

Using WideCharToMultiByte() to calculate the offset in bytes from the number of wchars that was written is not correct either. Invalid UTF-8 byte produces \ufffd which will be converted back to 3-bytes UTF-8 sequence \xef\xbf\xbd. So you can get wrong len, even larger than the initial length.

There are other bugs here, for example, no MemoryError is raised when PyMem_Malloc() or other memory allocation functions fail.

124059 tries to fix this, but the problem is that it is difficult to test, because normally all characters are written. I used a trick -- in the debug build the binary search is triggered even when all characters are written, but this is not a good test. @eryksun, do you know how to make WriteConsoleW() writing less characters than provided?

serhiy-storchaka commented 1 month ago

Note also that you should run the new test without using libregrtest, because it is guarded by non-standard resource "console":

python.bat -m test.test_winconsoleio -v