lcm-proj / lcm

Lightweight Communications and Marshalling
GNU Lesser General Public License v2.1
1k stars 393 forks source link

(dynamically-linked) lcm cannot be used in mex files on linux for matlab > 2014a #147

Open RussTedrake opened 7 years ago

RussTedrake commented 7 years ago

This is a long-standing issue that just reared it's head again in https://github.com/RobotLocomotion/drake/pull/4256 .
I decided that I should cross-post so that others are aware. Unfortunately, the best solution might be to change the regex call in lcm; a simpler and more reasonable work-around would be to at least export the lcm-static targets via pkgconfig/cmake.

This concise summary is thanks to @mattantone:

This looks like a problem with library loading order. Here's a minimal mex sample that breaks lcm in R2014b:

#include <mex.h>
#include <iostream>
#include <lcm/lcm-cpp.hpp>
#include <lcmtypes/drc/robot_state_t.hpp>

struct Listener {
    void onMessage(const lcm::ReceiveBuffer* rbuf,
              const std::string& chan,
              const drc::robot_state_t* msg) {
    }
};

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    lcm::LCM theLcm;
    Listener listener;
    theLcm.subscribe("ROBOT_STATE", &Listener::onMessage, &listener);
    theLcm.handle();
    std::cout << "HANDLED MESSAGE" << std::endl;
}

When lcm's subscribe method is invoked, it calls g_regex_new() to parse any regular expressions in the channel name. glib in turn calls pcre_compile2 (in libpcre), which then processes the regular expression string and normally returns with no error code (this is the case in R2014a, for instance, and in all non-matlab usages). But in R2014b, an error is returned by the g_regex_new() call, and lcm cannot proceed.

I compiled lcm in debug mode, attached gdb to a running matlab instance, and ran the above mex function, setting a breakpoint at lcm.c:321, inside of the subscribe method. Doing instruction-level stepping through execution revealed that in R2014a, pcre_compile2 is called correctly in libpcre as expected, while in R2014b, glib actually calls a version of pcre_compile2 that resides in a completely different shared object (!!) - namely libPocoFoundation, which apparently has a method with identical signature. Matlab includes this library in the bin/ directory of R2014b and presumably it's loaded at startup.

So instead of mex -> lcm -> glib -> pcre, we have mex -> lcm -> glib -> PocoFoundation, and apparently the two versions of pcre_compile2 set error codes differently. This seems to imply that lcm c libraries are incompatible with matlab R2014b (and probably newer).

tprk77 commented 7 years ago

(It's really unfortunate that this libPocoFoundation defines an incompatible pcre_compile2. I wonder if that's a bug for them? And old version maybe? Anyway...)

I agree that this is probably a load order problem. It's been a while, but I once had a similar issue. I think you might be able to work around this by calling dlopen on libpcre.so on the first run of the mexFunction. Of course it's very ugly. I can't test personally, because I don't have Matlab at the moment.

This is just a guess, but you could also try setting the rpath of liblcm.so to point to the right libpcre.so. You could use a tool like patchelf to do this. Again, just a guess.

Not sure about a proper fix.

mwoehlke-kitware commented 7 years ago

It's really unfortunate that this libPocoFoundation defines an incompatible pcre_compile2. I wonder if that's a bug for them?

Probably. It looks like Poco embeds a copy of libpcre, and apparently also exports the symbols from it. That's... icky. I very much doubt that's something they should be doing.

I would recommend updating Poco if possible to the latest version and, if the problem still exists, filing a bug telling them not to export libprce's API. Since Poco is a C++ library, I don't think they mean to be doing that.

RussTedrake commented 7 years ago

That's not a sufficient fix for us, because poco is being distributed by the mathworks with matlab, not by us. we don't get to increment the libpoco version

RussTedrake commented 7 years ago

@tprk77 -- if we can make a trick like that work (rpath or manually dlopen to our version), that would be fantastic. We also discussed LD_PRELOAD, etc, but I didn't like the idea of forcing environment variables on people (especially ones that make big changes).

ashuang commented 7 years ago

hmm.. that's a bummer. I unfortunately don't have any great suggestions right now (at least none that seem very clean). As a long term goal, it would be great to eventually migrate LCM away from GLib and to C++11. C++11 provides everything that we would need from GLib (threading, regexes, and some data structures) and moving to it would eliminate LCM's one major dependency. That's a pretty large project though, and not one that I personally have an appetite for at the moment. Let me know if it's something that you guys would be interested in pushing forward.

As a separate idea: We switched to using GLib regexes a few years ago (Abe did that in 82d243ec40f59f94b0dbdde046025f99204fddfe). Previously we used POSIX regexes (see any commit older than that one). One option (a bit invasive) would be to add a compile-time flag to specify the regex library to use, resurrect the POSIX regex code (a few dozen lines) and you guys could have LCM use the POSIX regexes.

RussTedrake commented 7 years ago

Thanks @ashuang . Based on a quick read through that commit, it does look like the GLIB/Posix regex code might be pretty well contained and possible to restore (assuming those pieces haven't moved too much between that revision and master). We'll discuss.

fyi - @mwoehlke-kitware , @jamiesnape, @jwnimmer-tri, @david-german-tri .