osrf / capabilities

Implements the concept of capabilities as part of the robots-in-concert system.
Other
8 stars 26 forks source link

Race condition between capability_server start and calling "start_capability" #82

Open abencz opened 9 years ago

abencz commented 9 years ago

If capability server is launched after another node that calls the "start_capability" service as soon as it's available, capability_server will sometimes store "None" as the PID of the launched capability. You get an error like:

ERROR: service [/capability_server/stop_capability] responded with an error: error processing request: No running launch file with PID of 'None'

Echoing the "events" topic when this happens shows that the "launched" event happens before the "server_ready" event:

header: 
  seq: 1
  stamp: 
    secs: 1430405899
    nsecs: 543864965
  frame_id: ''
capability: slam/Localization
provider: slam/localization
type: "launched"
pid: 22241

---
header: 
  seq: 2
  stamp: 
    secs: 0
    nsecs: 0
  frame_id: ''
capability: ''
provider: ''
type: "server_ready"
pid: 0

I am able to reproduce this about 25% of the time with the following small node:

#include <ros/ros.h>
#include <capabilities/GetCapabilitySpecs.h>
#include <capabilities/StartCapability.h>

int main(int argc, char **argv)
{
  ros::init(argc, argv, "capability_test");
  ros::NodeHandle n;

  ros::ServiceClient start_capability =
    n.serviceClient<capabilities::StartCapability>("capability_server/start_capability");

  ros::service::waitForService("capability_server/start_capability");

  capabilities::StartCapability start_srv;
  start_srv.request.capability = "slam/Localization";
  start_capability.call(start_srv);

  return 0;
}

I considered listening for the "server_ready" event before trying to call the service, but that causes issues if the client node start long after the "server_ready" event happens because it will miss this event.

wjwwood commented 9 years ago

I see your point, perhaps there should be a service which allows you to check the state of the server.

Basically you would need a service like ~/wait_for_server which would be an std_srvs/Empty (http://docs.ros.org/indigo/api/std_srvs/html/srv/Empty.html). Then your code could wait for it to be available and then call it and wait for it to return. For the server, it would create the service server and the handler would wait for the server to be "ready" before returning.

I don't have a lot of time to spend on this, but given the above description it should be possible for someone other than me to make a pull request. I'll eventually get to it, but not at the moment.

abencz commented 9 years ago

Thanks for your input! Is there no way to defer the creation of the other services until the server is ready? That would make it zero effort on the part of a user.

wjwwood commented 9 years ago

Without the blocking call to the service, there will always be a race condition between the server being ready and the service server being made available. It might be possible to create the service server after the server is ready, but I'd have to dig into the code base to know for sure. Either way the service should do something, and it might as well block until the server is ready.

wjwwood commented 8 years ago

I'm going to reopen this as #83 improved the situation, it's not clear to me that it is completely sufficient.

wjwwood commented 8 years ago

@abencz if you have time to try out #82 (or master since it's merged) we could use feedback from you to confirm whether or not this should be closed.