thierry-martinez / pyml

OCaml bindings for Python
BSD 2-Clause "Simplified" License
187 stars 31 forks source link

Fix wide character conversion bugs #75

Closed jamesjer closed 2 years ago

jamesjer commented 2 years ago

I tried to run the tests on an x86_64 Fedora 35 box. The "run file with filename" test segfaulted, because the attempt to open the test file for reading failed, returning a NULL, and the NULL was then passed into the python library as a FILE *. The attempt to open for reading failed because of 2 bugs. This commit fixes both.

First, the last argument to mbstowcs is not big enough. It has to cover the null terminator. Failure to make it big enough led to garbage bytes in the last position, instead of the null terminator. As the mbstowcs man page says:

In order to avoid the case 2 above, the programmer should make sure n is greater than or equal to mbstowcs(NULL,src,0)+1.

Second, _Py_wfopen wants the mode to be a wide string as well as the filename. I checked old Python versions, and this is the case as far back as Python 3.2, when _Py_wfopen was introduced.

UnixJunkie commented 2 years ago

Do you have an example to trigger the bug that you are claiming you correct?

jamesjer commented 2 years ago

As I noted, the pyml test "run file with filename" segfaults with the current code. This can be replicated by creating a file named test.py with these contents:

print("Hello, world!")

Then build pymltop and do this:

$ ./pymltop
        OCaml version 4.13.1

# Py.initialize ();;
- : unit = ()
# Py.Run.load (Py.Filename "test.py") "test.py";;
Segmentation fault (core dumped)
UnixJunkie commented 2 years ago

In utop, this works fine:

#require "pyml";;
Py.initialize();;
Py.Run.load (Py.Filename "test.py") "test.py";;
jamesjer commented 2 years ago

I can't explain why it works for you in utop, but the code is incorrect. The specification for mbstowcs is here: https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbstowcs.html. It clearly states that an invocation mbstowcs(pwcs, s, n) modifies no more than n bytes of pwcs, which means that if s is n characters long, no null terminator will be written to pwcs. This is the reason for the sentence from the mbstowcs man page that I quoted above. Perhaps in your case, the n+1st byte is serendipitously zero. But it wasn't set to zero by mbstowcs if that function conforms to the standard.

Also, to verify that _Py_wfopen takes two wide string arguments, look at https://github.com/python/cpython/blob/main/Include/internal/pycore_fileutils.h, currently line 105. Download any release tarball from python 3.2 through 3.10, and look at Include/cpython/fileutils.h. The function prototype has not changed.

thierry-martinez commented 2 years ago

Thank you very much for the fix, and all your explanations. I am very sorry for the delay: this was a very busy month! Merged.