Multiple publishers writing on the same topic cause segfault

randaz81 commented 6 years ago

Let's consider the following case:

a process spawning multiple threads (i.e. multiple device wrappers inside yarprobotinterface).
each of these threads instantiate its own yarp::os::Node and its own yarp::os::Publisher
all these publishers write on a topic with the same name.

The process will segfault after a random amount of time, probably due to a race condition inside write() method.

A gist containing a snippet of code to reproduce the bug is provided below.

randaz81 commented 6 years ago

https://gist.github.com/randaz81/eabcdb90c0598b802f74daee94d3397f

Tobias-Fischer commented 6 years ago

Just a thought, but why should each thread instantiate its own Node? Also, in your code snip, there is no correspondence between node[i] and pub2[i], as there is no "registration" of the Publisher with the Node (at least as far as I know).

In ROS, each module calls ros::init() exactly once, and I guess the same makes sense for YARP's Node, doesn't it? Then the question is whether multiple Publishers still crash under one and the same node.

drdanz commented 6 years ago

@Tobias-Fischer YARP tries to stay as generic as possible, in order to allow integration with different systems... ROS is just one system, the fact that ROS supports only one node, does not mean that other system will work in the same way, limiting it to one single node adds a limitation in the systems that we will be able to integrate in the future.

Tobias-Fischer commented 6 years ago

Sure. If YARP wants to be that generic though and at the same time inter-operable with ROS, a Node should be associated with a Publisher though, isn't it? Currently that is not the case, contrary to the code snippet by @randaz81 which seemingly tries to create one Node per Publisher (I haven't run the snippet so I am not sure what happens currently before the crash occurs with that respect). Feel free to correct me if I'm wrong though, as I said it's just a thought :)

drdanz commented 6 years ago

Actually I haven't tested/reviewed @randaz81 code, but at a very quick check, this is likely to be a bug:

    yarp::os::Node* node[num];
    yarp::os::Publisher<geometry_msgs_Point> pub2[num];
    yarp::os::Time::delay(1.0);
    for (int i=0; i<num; i++)
    {
        char name [50];
        sprintf (name,"/testNode%d",i);
        node[i] = new yarp::os::Node (name);
        pub2[i].topic("/pubTest2");
    }

The Nodes should definitely be created before the Publishers. When the Publisher is created, there is no existing Node.

barbalberto commented 6 years ago

@Tobias-Fischer our current use case is a wrapper which has to publish data on a specific topic. Since iCub wrappers are split into parts, we have different wrappers which, independently one from the others, have to publish on the same topic.

If we use the same node for all of them, then we will have many publishers with the same node and same topic, and that's not possible. So we identify different parts using different nodes.

The point here is also to discuss what the best practise should be. Is this the best we can do, or there are other ideas which may be better?

Tobias-Fischer commented 6 years ago

Hi @barbalberto, I think my concern is quite well described in robotology/yarp-ros#18. I find the behavior that "YARP is supposed to create a port /topic@/node nested to the latest (temporal order) instantiated Node" not very intuitive, and prone to bugs as shown here and in that issue report.

My suggestion was to pass a reference of the Node to the Publisher (either at instantiation or when calling topic), which would avoid implicit requirements such as the need of creating the Node before the Publisher (see @randaz81 code and @drdanz comment) and also fix robotology/yarp-ros#18. This would be generic, however it would break the API.

But again, I am not very familiar with the topic so this is just an idea and I might totally miss the point ..

drdanz commented 6 years ago

I think @Tobias-Fischer proposal makes a lot of sense, we might actually consider reviewing and deprecating the current api (I haven't completely understood the differences between passing the topic to the constructor, compared to passing it in topic() and the reason for the various open()) in favour of something like

yarp::os::Publisher<T>::topic(const std::string& topic, const yarp::os::Node& node);
yarp::os::Subscriber<T>::topic(const std::string& topic, const yarp::os::Node& node);

I wonder if it is actually possible (and if it makes sense) for a publisher to publish on 2 different nodes.

randaz81 commented 6 years ago

I definitely like this solution and the API revision here proposed. About the last question, I currently can't find any reasonable example which could motivate the use of a single publisher/subscriber attached to multiple nodes.

barbalberto commented 6 years ago

Another idea I had time ago was to completely reverse the logic. Conceptually the Node is an executable and the topic is not a class like a yarp port but, as the name says, simply a topic... something to talk about :smiley:

The idea would be to create the Node as a singleton, so the first one using a Node will create it. (I mean nodes with the same name. Different names, different instances).

If an executable needs to publish on the same topic from different threads, it can recall the same Node without having to create another one each time. Furthermore it does not need to care if that node was already created by someone else or not. This will solve the problem we have now regarding the plugins. Any plugin can create a Node object with the same name, but only one will actually instantiated.

Then one can create a topic attached to that Node, like

Node n("myNode");
n.topic("myTopic");      // choose the syntax you like the most, that's an example

This will create a topic and attach it to the Node object, which will manage its life cycle. Here again, if many objects try to write to the same topic, they will simply get the reference to the already existing topic object.

In this way, if I have many robot wrappers writing all on the same topic /analog@/robot, only one node and one port will actually be created and shared among all plugins.

Basically we can run the two lines above as many times as we want, the result will always be one node and one port.

This will reduce resources consumption and also solve the problem of having many writers on the same topic from the root.

drdanz commented 6 years ago

I don't agree with making Node a singleton, because, as I said already,

YARP tries to stay as generic as possible, in order to allow integration with different systems... ROS is just one system, the fact that ROS supports only one node, does not mean that other system will work in the same way, limiting it to one single node adds a limitation in the systems that we will be able to integrate in the future.

But actually, reading the description, I believe you don't actually mean a "singleton", but something that is "unique" depending on the name, is this correct? In this case I totally agree, but I believe that we might have something similar already, see yarp::os::Nodes documentation, even though I have no idea about how it should be used...

In Node constructor:

https://github.com/robotology/yarp/blob/62d24814a32a7fb4f30c8c0eeaf3247111178853/src/libYARP_OS/src/Node.cpp#L550-L563

I believe that here, instead of nodes.addExternalNode(rname, *this); it should check if a Node nome with the same name exists, and eventually "share" the internal port with the one of the already existing node.

barbalberto commented 6 years ago

I believe you don't actually mean a "singleton", but something that is "unique" depending on the name, is this correct?

Yes, that's what I meant. You are right, singleton is not the right word, sorry.

AFAIK there is no strong relationship right now between Node and topic. In the code you posted, it looks like the Node (name) is set as the active one. Then, when creating a topic, the new topic uses the current active Node. I fear there could be concurrency issue here, that's why I proposed to have the Node as a topic container.

I agree on beeing as generic as possible, however the Node is an interface to ROS framework, it embedds parsing of ROS core messages and other ROS specificity so I don't think it will be re-usable with other frameworks anyway.

traversaro commented 6 years ago

Relevant discussion (in that case the context is ROS2 and Gazebo plugins) about single node/multiple nodes in a process: https://github.com/ros-simulation/gazebo_ros_pkgs/issues/797 .

robotology / yarp-ros

Multiple publishers writing on the same topic cause segfault #8