Closed nate-jackson closed 6 years ago
I understand why the first sleep might be needed, as perhaps the connection is not yet established. Hopefully there is a good way in ros to determine when anyone is listening to a publisher.
But for the second problem, where the second message is getting dropped if published immediately after the first, I'm starting to wonder if it's not the publisher, but the subscribers. I'm going to create two listener nodes, each listening to one of the above messages, to see if they get dropped.
I created the two separate nodes, one listening to each of the above messages on the different topics. Without sleeping in between the publish calls, I am unable to get the second message. So the bug seems to be on the publish side at least, could be partly both, not sure.
I just tried sleeping much longer before I publish the messages, and back to back publishing works fine in that case, to the separately listening nodes. So it looks like perhaps it was taking a long time to set up the connections.
So then we found a way to determine from ros2 when we have subscribers. node->count_subscribers(topic)
So we give it the topic, wait for that number to be greater than 0, then publish. This still wasn't enough though, but if we also add a spin_some at the end, the messages get published and the listening nodes receive them both, without any unncessary sleeps.
New sender:
int main(int argc, char** argv)
{
rclcpp::init(argc, argv);
auto node = rclcpp::Node::make_shared("string_sender");
auto pub1 = node->create_publisher<String>("topic1");
auto pub2 = node->create_publisher<String>("topic2");
String msg1;
msg1.data = "my data 1";
String msg2;
msg2.data = "my data 2";
while (node->count_subscribers("topic1") <= 0 ||
node->count_subscribers("topic2") <= 0)
{
std::cout << "Not enough subscribers yet (topic1 - "
<< node->count_subscribers("topic1")
<< " topic2 - "
<< node->count_subscribers("topic2")
<< ")\n";
rclcpp::sleep_for(std::chrono::milliseconds(100));
}
std::cout << "subscribers are good (topic1 - "
<< node->count_subscribers("topic1")
<< " topic2 - "
<< node->count_subscribers("topic2")
<< ")\n";
std::cout << "sending 1..." << std::endl;
pub1->publish(msg1);
std::cout << "sending 2..." << std::endl;
pub2->publish(msg2);
std::cout << "done sending, spinning" << std::endl;
rclcpp::spin_some(node);
return 0;
When we have lots of nodes running, we don't see 100% reliable data being delivered (with qos default). So reopening.
Does CoreDX set the size of the socket buffer or go with the system default?
sockets are dynamically resized unless you hardcode it to use something with an env var
I fixed this with this changeset: https://github.com/asirobots/rmw_coredx/commit/cb79fcc0d82c9cec7f59b3f68f2ead49eccb5067
We are still seeing this in the newest rmw_coredx. We aded in the changes from the cited changeset from Brannon, but it's not fixing it. However, we are seeing similar, although not as frequent, missed messages with other rmw's. I don't know if this is a valid test case anyway. I'm thinking we can just close it.
I'm seeing an issue with a simple ros node that tries to send two string messages to a simple listener that just listens for the messages and prints them out. The messages are each being sent on a separate topic. I have found that the only somewhat reliable way to see the listener get both messages is to sleep before sending the first, then sleeping again before sending the second. This seems like a brittle, unreliable way to have to work. Is there a better way? I'm using default qos, which is supposed to be reliable delivery. My concern is that we are having packets get dropped.
Sender node:
Listener node: