Closed erwinbonsmatopic closed 2 years ago
ping @jonbinney
Sorry it took me a while to get to this! I'd like to understand a bit more about why these changes are needed on your system. A couple of thoughts:
tf_static
) then the timestamp for the transform lookup shouldn't matter.... right? I don't understand why changing the timestamp to Time(0)
would help anything.robot_state_publisher
? How loaded is your CPU and memory?
- 10 seconds is a reallly really long time. Is this happening only at startup? On most systems, if a static transform isn't ready after a few seconds, it should indicate that something is wrong. Are you using the standard
robot_state_publisher
? How loaded is your CPU and memory?
The large time-out is indeed a work-around for a problem elsewhere. We have a experimental ROS configuration to explore performance. The performance varies widely on different systems. Normally the transform is obtained with 100ms but one system requires a time-out of fifteen seconds. If the timeout is less, e.g. five seconds, the transform publisher gets into a state where it consumes 100% CPU and becomes very sluggish (it can still accept new subscribers, but takes long to respond). See the CPU usage plot below. Note, the same set-up runs fine on a system that has the same processor architecture (ARM) and less computational power. It appears there is a bug in the ros_comm pub-sub infrastructure, but that we just started to analyze. Hopefully we can identify the root cause (and provide a fix).
Meanwhile, if there is a hardcoded time-out, it should be large for this code to be (more) robust to failures elsewhere. Alternatively, we can make it configurable (which is actually what we are using in our experiment set-up)
- if the transform is truly static (published on
tf_static
) then the timestamp for the transform lookup shouldn't matter.... right? I don't understand why changing the timestamp toTime(0)
would help anything.
The Time(0) ensures that whenever a message is received, it is never rejected. The problem with the previous code was that it can reject messages. This should not happen in happy scenarios, but can happen in situations where the subscription takes longer than expected. In this case, we have seen it receive a transform but reject it. As the transform is static, there is no need for this. So changing time to Time(0) makes this code more robust.
Thanks for the detailed explanation. A couple thoughts:
Time(0)
helps you now. Unfortunately there are robots with laser scanners on actuated joints (for example the upper hokoyu on the PR2). Using the correct timestamp ensures that those robots use the correct transform when filtering.How about a special case for startup? For the first N seconds, if the filter fails to find a transform for a laser scan, it doesn't publish a filtered scan, and it outputs a warning to ros logging. After N seconds, the filter logs an error and exits. I'd still like N to be 20 seconds or less.
Thanks for your feedback, Jon.
An important thing to note though, this code is for the static polygon filter, where the transform does not change and is obtained only once. Your example with actuated joints therefore does not apply. For this, the other polygon filter should be used (defined in the same file), which continuously receives polygon messages.
So the static filter only has the start-up use-case. Once it has obtained the transform it unsubscribes (to avoid the overhead of these incoming transform messages, which was noticeable).
Your point is valid that the currently hard-coded time-out of sixty seconds is too long is valid and unnecessarily delays feedback in case of errors. I therefore propose to make the time-out configurable and reduce the default to five seconds (this will in exceptional cases not suffice, but can then be increased).
I have extended the PR to reduce the default time-out and make it configurable. Also, a minor change in the header to fix a build warning (a regression introduced in 35a748eb).
Good point about this being the static version :-) Just a few logging changes from the code review, then ready to merge.
I also made some minor formatting changes so that formatting is consistent.
This caused a minor conflict with noetic-devel branch, which I resolved by merging noetic-devel into this branch. Let me know if you would instead prefer me to rebase this branch
Did you mean to commit the dockerfile?
The Dockerfile should not be added. That was a mistake. I will remove it. How do you prefer I do this? Via a separate commit, or by changing the commit that introduced it?
Go ahead and modify the commit - thanks for asking!
Looks good! @erwinbonsmatopic is this ready to merge?
@jonbinney Yes, this is ready to merge. I performed a few integration tests today and that also went fine. Thanks
During integration tests using the new static polygon filter we faced problems with time outs when listening for the required transform.
This PR fixes that by, firstly, not putting restrictions on the time of the requested transform (it is static anyway), and secondly, increasing the hard-coded timeout limit to sixty seconds. This should suffice. In our testing it required at most ten seconds.