Closed GoesM closed 2 months ago
You know what to do :wink:
Ah... I think the situation here might be a bit different and not suitable for validation through validageMsg()
.
In my understanding, nav2_util::validateMsg()
is a generic check mainly for nan
, inf
, and similar requirements that all messages of the same msg-type should meet. However, the error here is caused by a large value of x in a specific calculation process of nav2_amcl
.
And I believe it's more appropriate to perform the check in the callback process of /initialpose
in nav2_amcl
, rather than adding conditions to nav2_util::validateMsg()
. There are two main reasons for this:
nav2_amcl
and may not apply to all messages of this msg-type.nav2
operates and the specific circumstances of other packages that use the message of PoseWithCovarianceStamped
type.Overall, I think agreeing on an upper limit for x and implementing it in the callback process of nav2_amcl
would be more appropriate.
However, I'm still checking the specific line of code where the overflow occurs and haven't confirmed it yet. I think here would be a better method to fix it after a further check.
SO, I'd further confirm the specific line of code causing the buffer overflow and investigate why JUST a large value (still in the range of integer
) here could lead to a heap-buffer-overflow.
And I believe it's more appropriate to perform the check in the callback process of /initialpose in nav2_amcl
Ok! I trust your opinion :smile: Let me know what you think is best!
@GoesM any update?
I'm so sorry for my late work about this issue because I'm struggled by other works. T_T
After the updating humble-branch, validateMsg()
is deployed so the command to reproduce this issue needs some little change, as I'd edited it.
**which code-line met to heap-buffer-overflow
I insert code for log as following:
void goes_debug(int cnt){
fprintf(stderr, "[GOES|DEBUG]:---------- %d \n", cnt);
}
// Re-compute the cluster statistics for a sample set
void pf_cluster_stats(pf_t * pf, pf_sample_set_t * set)
{
...
...
goes_debug(3);
// Compute cluster stats
for (i = 0; i < set->sample_count; i++) {
...
...
goes_debug(4);
cluster = set->clusters + cidx;
goes_debug(5);
cluster->weight += sample->weight;
goes_debug(6);
weight += sample->weight;
goes_debug(7);
...
...
}
...
}
And I catched the log during the crash:
[amcl-7] [INFO] [1717073213.332957908] [amcl]: Setting pose (5.019000): -751613824.000 0.378 3.142
[amcl-7] [GOES|DEBUG]:---------- 1
[amcl-7] [GOES|DEBUG]:---------- 2
[amcl-7] [GOES|DEBUG]:---------- 3
[amcl-7] [GOES|DEBUG]:---------- 4
[amcl-7] [GOES|DEBUG]:---------- 5
[amcl-7] [GOES|DEBUG]:---------- 6
[amcl-7] [GOES|DEBUG]:---------- 7
[amcl-7] [GOES|DEBUG]:---------- 4
[amcl-7] [GOES|DEBUG]:---------- 5
[ERROR] [amcl-7]: process has died [pid 111426, exit code 1, cmd '/home/goes/ROS_Workstation/humble_fork/install/nav2_amcl/lib/nav2_amcl/amcl --ros-args --log-level info --ros-args -r __node:=amcl --params-file /tmp/launch_params_t5mx02k0 -r /tf:=tf -r /tf_static:=tf_static'].
It proves that the bug occurs in line:
And it seems like cluster = set->clusters +cidx
would let the cluster
became an invalidate pointer.
However, I am not an expert about the pf-calculation , and not clear whether such buffer-overflow is related to the size of map or not ?
If related to , I think we could add a check to ensure that the received pose is within the map range ?
Sure! That seems sensible
@GoesM can you submit a PR?
oh no, I'm very very sorry for that, it's my missing.
I submit the PR #4416
No worries!
Bug report
Required Info:
Steps to reproduce issue
Launch the navigation2 normally, as following steps:
Curious about how nav2 face to topic-interception, i sent validate
/PoseWithCovarianceStamped
msg onto topic/initialpose
, which is like this:[notice] the value of
x
of theposition
is-751613824.000000
, which leads to the heap-buffer-overflow.Expected behavior
Actual behavior
The ASAN reporting a stack-buffer-overflow bug to me as following, and the
nav2_amcl
stop its work.Additional information
it seems like that a high value of
x
could lead to such bug...