wiseio / paratext

A library for reading text files over multiple cores.
Apache License 2.0
1.06k stars 103 forks source link

Build fails with Python 3.5.0 & SWIG 3.0.10 & g++ 4.8.1 #36

Closed prehensilecode closed 7 years ago

prehensilecode commented 7 years ago

System is RHEL 6.4 with gcc 4.8.1 and SWIG 3.0.10 and Python 3.5.0 installed.

Firstly, python/setup.py is not Python3 clean: it uses print statements rather than print() functions. After running it through 2to3, the build fails with a few occurences of this error:

g++: ../src/paratext_internal_wrap.cxx
In file included from ../src/paratext_internal_wrap.cxx:3140:0:
../src/python/numpy_helper.hpp: In member function ‘string_array_output_iterator& string_array_output_iterator::operator++()’:
../src/python/numpy_helper.hpp:218:75: error: ‘PyString_FromStringAndSize’ was not declared in this scope
     PyObject *s = PyString_FromStringAndSize(output.c_str(), output.size());

Modifying src/python/numpy_helper.hpp to replace "PyString_FromStringAndSize()" with "PyUnicode_FromStringAndSize()" makes the build succeed.

Perhaps something like this is required to support both Python2 and Python3.

deads commented 7 years ago

Hi,

Thanks for your bug report. It was a bit of a challenge to maintain both Python 2 and 3 support manually. We discovered issues such as this one you found while we built out the automated test suite. Now that our Travis build system is complete and automated Python 2&3 builds, we don't expect this bug to surface again.

https://travis-ci.org/deads/paratext

Python 2.x's PyString_XXX makes no assumptions about the underlying encoding of the data, while PyUnicode_XXX assumes the data is a sequence of valid codepoints. In Python 2.7, the internal string does not need to be a valid sequence of Unicode codepoints (e.g. u"\uDCFF" works in Python 2, but not 3). In Python 3.x, I believe no Unicode string can be constructed unless it is sequence of valid codepoints. In the case where the data is loaded from a file and no assumption can be made about its encoding, one must be a bit more delicate. For Python 3, PyBytes_XXX can be used for this purpose.

Can you repull from master and verify that you are no longer experiencing this issue? Henceforth, all proposed pull requests will have to pass all tests on both Python 2 and Python 3, unless there is a good reason for allowing a test to fail.

Best, Damian