pypa / manylinux

Python wheels that work on any linux (almost)
MIT License
1.45k stars 218 forks source link

Default link options causing seg-faults #121

Open kif opened 7 years ago

kif commented 7 years ago

I found a bug in pyFAI when compiled on manylinux1. All library built on "manylinux1" was segfaulting on all linux distribution but "manylinux1". The reference to the bug is: https://github.com/silx-kit/pyFAI/issues/649

The core of the bug is that on manylinux1, the linker (gcc) does not try to resolve internally the names:

Any opinion on this ?

njsmith commented 7 years ago

There's some discussion here: https://groups.google.com/forum/#!topic/cython-users/lgNG9ws-9-4

Unfortunately it's true, C symbol resolution is very counterintuitive...

I'm hesitant to suggest -Bsymbolic everywhere because it's a hack that breaks the normal rules for how Linux symbol resolution work (in particular it breaks LD_PRELOAD), and it only stops the code being built from getting symbols from elsewhere; it doesn't stop the code from breaking other libraries by interposing its symbols onto them. Hidden visibility is a cleaner solution. OTOH, sometimes the best solution is a hack.

Arguably the right solution is to force -fvisibility=hidden when building Python extensions, but unfortunately we can't just do that globally in the manylinux compilers, because they're also used to build shared libraries that are used by Python extensions and it would break them. I think setuptools could do this though b/c it knows whether it's building a Python extension or not? Of course this wouldn't 100% solve the problem because you could potentially have a shared library that your extension uses and that conflicts with one of the symbols that Python exports. But that's unlikely because common shared libraries have already worked out naming schemes to avoid popular libraries like libz.

Or possibly Python should stop linking directly to libz and libexpat, why do they even do that.

kif commented 7 years ago

Thanks Nathaniel for your input. I have learned a lot !

On Thu, 31 Aug 2017 18:17:00 +0000 (UTC) "Nathaniel J. Smith" notifications@github.com wrote:

There's some discussion here: https://groups.google.com/forum/#!topic/cython-users/lgNG9ws-9-4

Unfortunately it's true, C symbol resolution is very counterintuitive...

I'm hesitant to suggest -Bsymbolic everywhere because it's a hack that breaks the normal rules for how Linux symbol resolution work (in particular it breaks LD_PRELOAD), and it only stops the code being built from getting symbols from elsewhere; it doesn't stop the code from breaking other libraries by interposing its symbols onto them. Hidden visibility is a cleaner solution. OTOH, sometimes the best solution is a hack.

We were lucky to spot that bug on a small, self contained example. On larger project we are working on, we would not have investigated that deep. Numpy is explicitly hiding every c_function not be exposed on the python side, but one may forget to flag it as not to be exposed.

Arguably the right solution is to force -fvisibility=hidden when building Python extensions, but unfortunately we can't just do that globally in the manylinux compilers, because they're also used to build shared libraries that are used by Python extensions and it would break them. I think setuptools could do this though b/c it knows whether it's building a Python extension or not? Of course this wouldn't 100% solve the problem because you could potentially have a shared library that your extension uses and that conflicts with one of the symbols that Python exports. But that's unlikely because common shared libraries have already worked out naming schemes to avoid popular libraries like libz.

I am considering a solution like this which exposes only the python side of the library:

if self.compiler.compiler_type == 'unix':
        if sys.version_info[0] <= 2:
            ext.extra_compile_args.append('''-fvisibility=hidden -D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')
        else:  # Python3
            ext.extra_compile_args.append('''-fvisibility=hidden -D'PyMODINIT_FUNC=__attribute__((visibility("default"))) PyObject* ' ''')

apparently it works ... but their may be other side effects.

Or possibly Python should stop linking directly to libz and libexpat, why do they even do that.

Libz has little to do in this story. It could have been any third party library.

Cheers, -- Jérôme Kieffer tel +33 476 882 445

njsmith commented 7 years ago

-D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')

Are you sure this is necessary? I think PyMODINIT_FUNC should already set symbol visibility correctly by default. (It uses __declspec(dllexport), but gcc recognizes that as an alias for __attribute__((visibility("default"))).)

Libz has little to do in this story. It could have been any third party library.

Not true! Only libraries that the Python interpreter is explicitly linked to (i.e., the ones that show up if you do ldd python) get forced onto extension modules. Otherwise, extension modules and the libraries they use are loaded into separate namespaces so they can't interfere with each other. So on my system it's literally only libz and libexpat whose symbols can cause collisions.

(Well, life gets more complicated if you're embedding Python into another program -- then all that program's symbols can also cause problems. But let's worry about that another time...)

kif commented 7 years ago

On Fri, 01 Sep 2017 10:01:45 +0000 (UTC) "Nathaniel J. Smith" notifications@github.com wrote:

-D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')

Are you sure this is necessary? I think PyMODINIT_FUNC should already set symbol visibility correctly by default. (It uses __declspec(dllexport), but gcc recognizes that as an alias for __attribute__((visibility("default"))).)

We did something like this: https://github.com/silx-kit/pyFAI/pull/663/commits/a03f2bafe9ca963c4fbe7faadb7c38cb892e92c8#diff-2eeaed663bd0d25b7e608891384b7298 so that only the Python function get's exposed.

I checked that no more symbols get exposed (beside the one for Python), manylinx wheels work outside their environment, C-import between Cython modules apparently still works (using a cdef-class from another extension still works), It looks OK from our perspective.

Libz has little to do in this story. It could have been any third party library.

Not true! Only libraries that the Python interpreter is explicitly linked to (i.e., the ones that show up if you do ldd python) get forced onto extension modules. Otherwise, extension modules and the libraries they use are loaded into separate namespaces so they can't interfere with each other. So on my system it's literally only libz and libexpat whose symbols can cause collisions.

I did not know about this. It explains why it did not crashing more often.

(Well, life gets more complicated if you're embedding Python into another program -- then all that program's symbols can also cause problems. But let's worry about that another time...)

I do agree... but a colleague of mine is working on embedding Python in Octave :/

Thanks for your input.

njsmith commented 7 years ago

My question was: if you leave out the -DPyMODINIT_FUNC=... flag, and just use -fvisibility=hidden alone, then does that work?

kif commented 7 years ago

On Sat, 02 Sep 2017 01:02:22 +0000 (UTC) "Nathaniel J. Smith" notifications@github.com wrote:

My question was: if you leave out the -DPyMODINIT_FUNC=... flag, and just use -fvisibility=hidden alone, then does that work?

Nop, no symbols are exposed then (still gcc/unix) and of course no module could be loaded at the python level.

As the behaviour of Windows is the opposite (everything is hidden except what is explicitly exposed), a preprocessor macro PyMODINIT_FUNC already exists and is used to exposed under windows. It does nothing under unix by default. This patch just mimics the behaviour of windows on linux.

Cheers,

Jerome