Open rshanor opened 1 year ago
Running with debug logging. I see Sending service response
in the server logs but a long delay before Action status taken
in the client logs.
Client
[INFO] [1671659682.131745101] [call_grab_image_action]: Sending goal.
[DEBUG] [1671659682.131818861] [rcl_action]: Sending action goal request
[DEBUG] [1671659682.131822305] [rcl]: Client sending service request
[DEBUG] [1671659682.131853398] [rcl_action]: Action goal request sent
[DEBUG] [1671659682.131902662] [rcl]: Timer canceled
[DEBUG] [1671659682.131915339] [rcl]: Initializing timer with period: 3000000000ns
[DEBUG] [1671659682.131922580] [rcl]: Updated timer period from '3000000000ns' to '3000000000ns'
[DEBUG] [1671659682.131942364] [rcl]: Timer canceled
[DEBUG] [1671659682.131947286] [rcl]: Initializing timer with period: 2999935849ns
[DEBUG] [1671659682.131950934] [rcl]: Updated timer period from '2999935849ns' to '2999935849ns'
[DEBUG] [1671659682.131969332] [rcl]: Timer canceled
[DEBUG] [1671659682.131974137] [rcl]: Initializing timer with period: 2999908247ns
[DEBUG] [1671659682.131977434] [rcl]: Updated timer period from '2999908247ns' to '2999908247ns'
[DEBUG] [1671659682.132024143] [rcl]: Initializing wait set with '2' subscriptions, '2' guard conditions, '1' timers, '3' clients, '6' services
[DEBUG] [1671659682.132095458] [rcl]: Initializing wait set with '2' subscriptions, '2' guard conditions, '1' timers, '3' clients, '6' services
[DEBUG] [1671659682.133729544] [rcl_action]: Taking action goal response
[DEBUG] [1671659682.133742448] [rcl]: Client taking service response
[DEBUG] [1671659682.133754879] [rcl]: Client take response succeeded: true
[DEBUG] [1671659682.133757253] [rcl_action]: Action goal response taken
[INFO] [1671659682.134132751] [call_grab_image_action]: Goal accepted, waiting for result.
[DEBUG] [1671659682.134190355] [rcl_action]: Sending action result request
[DEBUG] [1671659682.134193659] [rcl]: Client sending service request
[DEBUG] [1671659682.134214154] [rcl_action]: Action result request sent
[DEBUG] [1671659682.134268579] [rcl]: Timer canceled
[DEBUG] [1671659682.134283606] [rcl]: Initializing timer with period: 3000000000ns
[DEBUG] [1671659682.134291090] [rcl]: Updated timer period from '3000000000ns' to '3000000000ns'
[DEBUG] [1671659682.134311948] [rcl]: Timer canceled
[DEBUG] [1671659682.134316976] [rcl]: Initializing timer with period: 2999928587ns
[DEBUG] [1671659682.134320655] [rcl]: Updated timer period from '2999928587ns' to '2999928587ns'
[DEBUG] [1671659682.134336290] [rcl]: Timer canceled
[DEBUG] [1671659682.134340508] [rcl]: Initializing timer with period: 2999903637ns
[DEBUG] [1671659682.134344632] [rcl]: Updated timer period from '2999903637ns' to '2999903637ns'
[DEBUG] [1671659682.134392980] [rcl]: Initializing wait set with '2' subscriptions, '2' guard conditions, '1' timers, '3' clients, '6' services
[DEBUG] [1671659682.134446534] [rcl_action]: Taking action status
[DEBUG] [1671659682.134448980] [rcl]: Subscription taking message
[DEBUG] [1671659682.134479482] [rcl]: Subscription take succeeded: true
[DEBUG] [1671659682.134481962] [rcl_action]: Action status taken
[DEBUG] [1671659682.153070645] [rcl]: Timer canceled
[DEBUG] [1671659682.153106819] [rcl]: Initializing timer with period: 2981208927ns
[DEBUG] [1671659682.153137015] [rcl]: Updated timer period from '2981208927ns' to '2981208927ns'
[DEBUG] [1671659682.153201044] [rcl]: Initializing wait set with '2' subscriptions, '2' guard conditions, '1' timers, '3' clients, '6' services
[DEBUG] [1671659682.153286712] [rcl]: Initializing wait set with '2' subscriptions, '2' guard conditions, '1' timers, '3' clients, '6' services
[DEBUG] [1671659682.368585382] [rcl_action]: Taking action status
[DEBUG] [1671659682.368608456] [rcl]: Subscription taking message
[DEBUG] [1671659682.368701045] [rcl]: Subscription take succeeded: true
[DEBUG] [1671659688.946816840] [rcl_action]: Action status taken
Traceback (most recent call last):
assert result_future.done()
AssertionError
And server
[DEBUG] [1671659682.131994830] [rcl]: Service server taking service request
[DEBUG] [1671659682.132022257] [rcl]: Service take request succeeded: true
[DEBUG] [1671659682.133360429] [robodk_sim_camera.action_server]: New goal request with ID: [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238]
[DEBUG] [1671659682.133560286] [rcl]: Sending service response
[DEBUG] [1671659682.133931800] [robodk_sim_camera.action_server]: New goal accepted: [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238]
[DEBUG] [1671659682.134421058] [robodk_sim_camera.action_server]: Executing goal with ID [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238]
[INFO] [1671659682.134647087] [robodk_sim_camera]: Capturing single image...
[DEBUG] [1671659682.368423588] [rcl]: Timer successfully reset
[DEBUG] [1671659682.368447476] [rcl]: Updated timer period from '459884619236ns' to '459617440767ns'
[INFO] [1671659682.368759227] [robodk_sim_camera]: Captured single image...
[DEBUG] [1671659682.369134753] [robodk_sim_camera.action_server]: Goal with ID [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238] finished with state 4
[DEBUG] [1671659682.369311198] [rcl]: Initializing wait set with '0' subscriptions, '2' guard conditions, '2' timers, '0' clients, '13' services
[DEBUG] [1671659682.369407977] [rcl]: Service server taking service request
[DEBUG] [1671659682.369421793] [rcl]: Service take request succeeded: true
[DEBUG] [1671659682.370370704] [robodk_sim_camera.action_server]: Result request received for goal with ID: [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238]
[DEBUG] [1671659682.374861538] [rcl]: Sending service response
[DEBUG] [1671659682.377828502] [rcl]: Initializing wait set with '0' subscriptions, '2' guard conditions, '2' timers, '0' clients, '13' services
[DEBUG] [1671659682.377947648] [rcl]: Initializing wait set with '0' subscriptions, '2' guard conditions, '2' timers, '0' clients, '13' services
And FWIW, I noticed the same behavior with rmw_cyclonedds_cpp
I just noticed that the server is leaking memory like crazy. When I take out the line result.images.append(image_msg)
, I do not observe the leak. The below plot is over a couple minutes at most.
[DEBUG] [1671659682.131853398] [rcl_action]: Action goal request sent
-> Goal request sent by client.
...
[DEBUG] [1671659682.133560286] [rcl]: Sending service response
-> Goal accepted by server, sending goal response.
...
[DEBUG] [1671659682.133757253] [rcl_action]: Action goal response taken
-> Goal response received by client.
[INFO] [1671659682.134132751] [call_grab_image_action]: Goal accepted, waiting for result.
[DEBUG] [1671659682.134190355] [rcl_action]: Sending action result request
[DEBUG] [1671659682.134193659] [rcl]: Client sending service request
[DEBUG] [1671659682.134214154] [rcl_action]: Action result request sent
-> Result request sent by client.
...
[DEBUG] [1671659682.370370704] [robodk_sim_camera.action_server]: Result request received for goal with ID: [ 60 22 78 33 163 196 71 243 146 43 48 200 194 203 238 238]
[DEBUG] [1671659682.374861538] [rcl]: Sending service response
-> Result response sent by server.
...
[DEBUG] [1671659682.368585382] [rcl_action]: Taking action status
[DEBUG] [1671659682.368608456] [rcl]: Subscription taking message
[DEBUG] [1671659682.368701045] [rcl]: Subscription take succeeded: true
-> This is status topic, status succeeded delivered to client.
[DEBUG] [1671659688.946816840] [rcl_action]: Action status taken
-> Why this takes 5 seconds to take the message out of the queue?
Why we are not seeing message Taking action result response
?
Not sure. My ActionClient is initialized as follows, if it matters at all.
self._action_client = ActionClient(
node, GrabImages, action_name, callback_group=MutuallyExclusiveCallbackGroup()
)
If I am not mistake, the only possibly meaningful code between Subscription take succeeded
and Action #Type taken
is TRACEPOINT(rcl_take, (const void *)ros_message);
.
The rcl_action code in question:
// \internal Takes an action client specific topic message.
#define TAKE_MESSAGE(Type) \
RCUTILS_LOG_DEBUG_NAMED(ROS_PACKAGE_NAME, "Taking action " #Type); \
if (!rcl_action_client_is_valid(action_client)) { \
return RCL_RET_ACTION_CLIENT_INVALID; /* error already set */ \
} \
RCL_CHECK_ARGUMENT_FOR_NULL(ros_ ## Type, RCL_RET_INVALID_ARGUMENT); \
rmw_message_info_t message_info; /* ignored */ \
rcl_ret_t ret = rcl_take( \
&action_client->impl->Type ## _subscription, ros_ ## Type, &message_info, NULL); \
if (RCL_RET_OK != ret) { \
if (RCL_RET_SUBSCRIPTION_TAKE_FAILED == ret) { \
return RCL_RET_ACTION_CLIENT_TAKE_FAILED; \
} \
if (RCL_RET_BAD_ALLOC == ret) { \
return RCL_RET_BAD_ALLOC; \
} \
return RCL_RET_ERROR; \
} \
RCUTILS_LOG_DEBUG_NAMED(ROS_PACKAGE_NAME, "Action " #Type " taken"); \
return RCL_RET_OK;
rcl_ret_t
rcl_action_take_status(
const rcl_action_client_t * action_client,
void * ros_status)
{
TAKE_MESSAGE(status);
}
Which calls the following rcl code
rcl_ret_t
rcl_take(
const rcl_subscription_t * subscription,
void * ros_message,
rmw_message_info_t * message_info,
rmw_subscription_allocation_t * allocation
)
{
RCUTILS_LOG_DEBUG_NAMED(ROS_PACKAGE_NAME, "Subscription taking message");
if (!rcl_subscription_is_valid(subscription)) {
return RCL_RET_SUBSCRIPTION_INVALID; // error message already set
}
RCL_CHECK_ARGUMENT_FOR_NULL(ros_message, RCL_RET_INVALID_ARGUMENT);
// If message_info is NULL, use a place holder which can be discarded.
rmw_message_info_t dummy_message_info;
rmw_message_info_t * message_info_local = message_info ? message_info : &dummy_message_info;
*message_info_local = rmw_get_zero_initialized_message_info();
// Call rmw_take_with_info.
bool taken = false;
rmw_ret_t ret = rmw_take_with_info(
subscription->impl->rmw_handle, ros_message, &taken, message_info_local, allocation);
if (ret != RMW_RET_OK) {
RCL_SET_ERROR_MSG(rmw_get_error_string().str);
return rcl_convert_rmw_ret_to_rcl_ret(ret);
}
RCUTILS_LOG_DEBUG_NAMED(
ROS_PACKAGE_NAME, "Subscription take succeeded: %s", taken ? "true" : "false");
TRACEPOINT(rcl_take, (const void *)ros_message);
if (!taken) {
return RCL_RET_SUBSCRIPTION_TAKE_FAILED;
}
return RCL_RET_OK;
}
This issue was resolved in PR#1070.
Bug report
Required Info:
Steps to reproduce issue
I have an Action Server and Action Client implemented in Python. I noticed that periodically, the total client round trip time took several seconds, even though the server consistently responds in 250ms. Everything is running on the same machine and I am using the env variable
ROS_LOCALHOST_ONLY=1
. The messages size is fairly big, containing a single image slightly larger than 1080p. This is implemented as an ActionServer to match https://github.com/basler/pylon-ros-camera/tree/humble, but I could try to rewrite as a standard service if that would help.My server callback looks something like
And my client code looks like
From what I can tell, it looks like the delay is between waiting for result and receiving result.
Expected behavior
Client total action time to be slightly larger than server time.
Actual behavior
On occasion, client time is 10X or more larger than server time.
Additional information