Open RussTedrake opened 7 years ago
(It's really unfortunate that this libPocoFoundation
defines an incompatible pcre_compile2
. I wonder if that's a bug for them? And old version maybe? Anyway...)
I agree that this is probably a load order problem. It's been a while, but I once had a similar issue. I think you might be able to work around this by calling dlopen
on libpcre.so
on the first run of the mexFunction
. Of course it's very ugly. I can't test personally, because I don't have Matlab at the moment.
This is just a guess, but you could also try setting the rpath
of liblcm.so
to point to the right libpcre.so
. You could use a tool like patchelf
to do this. Again, just a guess.
Not sure about a proper fix.
It's really unfortunate that this libPocoFoundation defines an incompatible pcre_compile2. I wonder if that's a bug for them?
Probably. It looks like Poco embeds a copy of libpcre, and apparently also exports the symbols from it. That's... icky. I very much doubt that's something they should be doing.
I would recommend updating Poco if possible to the latest version and, if the problem still exists, filing a bug telling them not to export libprce's API. Since Poco is a C++ library, I don't think they mean to be doing that.
That's not a sufficient fix for us, because poco is being distributed by the mathworks with matlab, not by us. we don't get to increment the libpoco version
@tprk77 -- if we can make a trick like that work (rpath or manually dlopen to our version), that would be fantastic. We also discussed LD_PRELOAD, etc, but I didn't like the idea of forcing environment variables on people (especially ones that make big changes).
hmm.. that's a bummer. I unfortunately don't have any great suggestions right now (at least none that seem very clean). As a long term goal, it would be great to eventually migrate LCM away from GLib and to C++11. C++11 provides everything that we would need from GLib (threading, regexes, and some data structures) and moving to it would eliminate LCM's one major dependency. That's a pretty large project though, and not one that I personally have an appetite for at the moment. Let me know if it's something that you guys would be interested in pushing forward.
As a separate idea: We switched to using GLib regexes a few years ago (Abe did that in 82d243ec40f59f94b0dbdde046025f99204fddfe). Previously we used POSIX regexes (see any commit older than that one). One option (a bit invasive) would be to add a compile-time flag to specify the regex library to use, resurrect the POSIX regex code (a few dozen lines) and you guys could have LCM use the POSIX regexes.
Thanks @ashuang . Based on a quick read through that commit, it does look like the GLIB/Posix regex code might be pretty well contained and possible to restore (assuming those pieces haven't moved too much between that revision and master). We'll discuss.
fyi - @mwoehlke-kitware , @jamiesnape, @jwnimmer-tri, @david-german-tri .
This is a long-standing issue that just reared it's head again in https://github.com/RobotLocomotion/drake/pull/4256 .
I decided that I should cross-post so that others are aware. Unfortunately, the best solution might be to change the regex call in lcm; a simpler and more reasonable work-around would be to at least export the lcm-static targets via pkgconfig/cmake.
This concise summary is thanks to @mattantone:
This looks like a problem with library loading order. Here's a minimal mex sample that breaks lcm in R2014b:
When lcm's subscribe method is invoked, it calls
g_regex_new()
to parse any regular expressions in the channel name. glib in turn calls pcre_compile2 (in libpcre), which then processes the regular expression string and normally returns with no error code (this is the case in R2014a, for instance, and in all non-matlab usages). But in R2014b, an error is returned by theg_regex_new()
call, and lcm cannot proceed.I compiled lcm in debug mode, attached gdb to a running matlab instance, and ran the above mex function, setting a breakpoint at lcm.c:321, inside of the subscribe method. Doing instruction-level stepping through execution revealed that in R2014a, pcre_compile2 is called correctly in libpcre as expected, while in R2014b, glib actually calls a version of pcre_compile2 that resides in a completely different shared object (!!) - namely libPocoFoundation, which apparently has a method with identical signature. Matlab includes this library in the bin/ directory of R2014b and presumably it's loaded at startup.
So instead of mex -> lcm -> glib -> pcre, we have mex -> lcm -> glib -> PocoFoundation, and apparently the two versions of pcre_compile2 set error codes differently. This seems to imply that lcm c libraries are incompatible with matlab R2014b (and probably newer).